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Introduction 

Surgery  and  radiation  therapies  are  difficult  to  use  in  the  treatment  of  lung  cancer  because  the 
diagnosis  often  occurs  when  patients  already  have  metastasis.  Drug-based  therapies  are  therefore  the 
best  option,  but  intrinsic  and  acquired  drug  resistance  still  makes  the  5-year  survival  rate  for  this  disease 
less  than  15%.  Over  the  years,  many  specific  mechanisms  associated  with  drug  resistance  in  lung  cancer 
have  been  pinpointed,  but  we  are  still  far  from  understanding  how  to  overcome  it.  Combination  drug 
therapy  is  commonly  used  to  enhance  efficacy  and  overcome  drug  resistance  in  cancer,  but  at  present 
the  choice  of  drugs  and  doses  is  based  on  empirical  clinical  experience  alone.  In  this  project  we  have 
used  an  interdisciplinary  approach  based  on  the  mathematics  of  complex  networks  to  identify  drug 
combinations  that  could  be  effective  in  the  therapy  of  lung  cancer. 

This  reports  describes  the  methods  used  and  presents  computational  and  experimental  data 
that  we  have  obtained  during  the  period  of  operations. 
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Publications 


The  results  of  activities  carried  out  in  this  project  have  been  published  in: 

1.  A  Szedlak,  G  Paternostro,  C  Piermarocchi,  Control  of  asymmetric  Hopfield  networks  and 
application  to  cancer  attractors,  PLoS  ONE  9:el05842  (2014);  copy  attached  in  Appendix. 

2.  Trish  P  Tran,  Edison  Ong,  Andrew  P  Hodges,  Giovanni  Paternostro,  Carlo  Piermarocchi, 
Prediction  of  kinase  inhibitor  response  using  activity  profiling,  in-vitro  screening,  and  elastic  net 
regression,  BMC  Systems  Biology  8:74  (2014);  copy  attached  in  Appendix. 
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Summary  of  Key  research  accomplishments 


•  We  have  developed  a  novel  computational  approach  to  simulate  the  signaling  dynamics  in 
gene/transcription  factor  networks  (see  publication  #1).  The  model  allows  for  a  direct  mapping 
of  a  gene  expression  pattern  into  dynamical  attractor  states  and  can  test  different  strategies  to 
disrupt  cancer-specific  attractor  patterns.  We  have  designed  algorithms  to  identify  signaling 
bottlenecks,  which  are  single  nodes  or  strongly  connected  clusters  of  nodes  that  have  a  large 
impact  on  the  signaling.  Bottlenecks  identify  ideal  targets  for  the  rational  design  of  robust 
therapeutic  interventions. 

•  The  method  of  publication  #1  has  been  applied  to  signaling  in  lung  cancer  cell  lines,  specifically 
on  A549,  H358  cell  lines.  The  method  has  identified  targets  that  are  predicted  to  be  associated 
to  a  selective  and  robust  response  to  therapy.  The  full  list  of  targets  is  given  in  Table  1  below. 
Some  of  the  genes  identified  are  consistent  with  current  clinical  and  cancer  biology  knowledge: 
TP53  is  frequently  mutated  in  lung  cancer;  mutations  in  PBX1  have  been  detected  in  non-small- 
cell  lung  cancer;  MAP3K3  and  MAP3K14  are  in  the  MAPK/ERK  pathway,  which  is  a  target  of 
many  novel  therapeutic  agents;  and  SRC  is  a  well  known  oncogene  and  a  candidate  target  in 
lung  cancer.  The  method,  however,  has  also  discovered  new  targets,  in  particular  kinases 
BMPR1B,  TTN,  LCK  and  RIPK3,  which  could  be  targeted  using  kinase  inhibitors. 

•  We  have  developed  a  novel  method  of  statistical  analysis  of  in-vitro  testing  of  drug 
combinations  (see  publication  #2).  The  method  integrates  information  contained  in  networks 
representing  the  signaling  of  drugs  to  their  targets  with  experimental  data  from  in  vitro 
screening.  The  method  uses  the  in  vitro  cell  response  to  single  drugs  or  drug  combinations  as  a 
training  set  to  build  linear  and  nonlinear  regression  models  and  predicts  the  response  of  cells  to 
new  drug  combinations.  Besides  predicting  the  effectiveness  of  untested  drugs,  the  method 
identifies  targets  that  are  statistically  associated  to  drug  sensitivity  in  a  given  cell  line. 

•  The  method  of  publication  #2  was  applied  to  the  A549  lung  cancer  cell  line,  and  we  identified 
specific  kinases  as  important  targets  in  this  type  of  cancer  (TGFBR2,  EGFR,  PHKG1  and  CDK4).  A 
pathway  enrichment  analysis  of  the  set  of  kinases  identified  by  the  method  showed  that  axon 
guidance,  activation  of  Rac,  and  semaphorin  interactions  pathways  are  associated  to  a  selective 
response  to  therapeutic  intervention  in  this  cell  line. 

•  Methods  in  publications  #1  and  #2,  in  conjunction  with  other  computational  and  experimental 
techniques  previously  developed  by  the  two  principal  investigators,  have  been  used  to  identify 
drug  combinations  effective  in  selectively  killing  A549  cell  lines  versus  a  fibroblast  normal  cell 
line  (IMR-90)  as  a  toxicity  control.  Combinations  with  the  highest  selectivity  are  given  in  Table  3 
below.  Overall,  combinations  containing  relatively  high  doses  of  a  PDK1/Aktl/Flt3  Dual  Pathway 
Inhibitor  (A15  in  the  Table)  and  a  HSP90  inhibitor  (AAG),  combined  with  relatively  low  doses  of  a 
PDGFR  (111)  and  a  WNT/BETA  CATENIN/PPAR  inhibitor  (535)  were  associated  with  the  highest 
selective  response  in  A549  adenocarcinoma  cell  lines. 
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Body:  Detailed  description  of  the  methods  as  outlined  in  the  Statement  of  Work  (SOW). 

1.  TASK1  of  SOW:  Collection  of  data  for  attractor  models. 

A  lung  cell  interactome  was  constructed  by  combining  TRANSFAC  and  PhosphoPOINT  data  (Subtask  1 
of  Task  1).  The  lung  network  interactome  we  built  has  "9,000  nodes  and  "45,000  edges.  Gene 
expression  data  was  obtained  from  the  Gene  Expression  Omnibus  (GEO)  database  for  A549 
adenocarcinoma,  H358  non-small  lung  cancer,  and  IMR90  fetal  lung  fibroblast  normal  cell  lines.  The 
model  requires  Boolean  gene  expression  states.  We  have  defined  a  cutoff  for  the  normalized  expression 
values,  and  all  genes  with  expression  below  the  cutoff  are  "off"  and  all  above  are  "on".  Because  the 
signaling  is  based  on  a  model  with  +1  states,  on  states  are  identified  by  the  variable  =  +1  and  off 
states  by  f  “  =  —1,  where  "a"  is  either  normal  (n)  or  cancer  (c). 

This  procedure  provided  the  configurations  corresponding  to  dynamical  attractor  states  in  our 
method  (Subtask  2  of  Task  1).  Figure  1  shows  representative  gene  expression  data  and  an  example  of 
how  the  cut-off  method  was  implemented. 
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Figure  1.  Representative  gene  expression  data  used  in  our  method.  Expression  levels  take  continuous  values,  but 
must  be  made  Boolean  for  our  model.  The  expression  level  cutoff  for  normal  lung  cells  (IMR90,  pictured),  for 
example,  use  a  cutoff  (dotted  line)  of  approximately  4.  This  was  chosen  because  the  number  of  on  states  is  of  the 
same  order  as  the  number  of  off  states,  but  more  importantly  the  number  of  on  and  off  states  is  not  very  sensitive 
to  small  changes  in  the  cutoff.  The  same  cutoff  is  used  for  both  normal  and  cancer  cells.  The  continuous 
distribution  of  expression  levels  is  roughly  the  same  for  normal  and  cancer  cells. 

We  have  defined  drug  inhibitor-kinase  links  for  a  library  containing  about  300  kinase  inhibitors 
using  experimental  surveys  of  kinase  inhibitor  targets.  (Subtask  3  of  Task  1) 

2.  TASK  2  of  SOW:  Development  of  attractor  model  based  on  neural  network  Hopfield  model 

After  making  the  attractor  states  Boolean,  we  encoded  the  states  fn(c)  = 
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in  a  signaling  model  defined  by  the  coupling  matrix 


Ji]=Atj(.tfZ?  +  ftfD'  Eq-  (!) 

where  An  is  the  adjacency  matrix  of  the  lung  cancer  network  interactome  obtained  in  Task  1,  and  N  is 
the  total  number  of  nodes.  The  model  calculates  the  total  signal  arriving  at  node  i  at  time  t  as 

hi(t)  =  'Lrj=1Jij<Jj(t), 

where  the  <X;(t)  is  the  state  of  the  node  i  at  time  t.  The  discrete-time  update  scheme  for  the  dynamical 
evolution  of  the  state  of  the  node  i ,  cr^t),  is  given  by 

<7i(t  +  At)  =  +1  if  hi(t )  >  0, 

<7i(t  +  At)  =  —1  if  ht(t )  <  0, 

and  chosen  randomly  from  +1  if  the  field  is  zero. 

Note  that  we  are  left  with  two  kinds  of  genes:  similarity  nodes,  where  =  ff,  and  differential 
nodes,  where  =  —  (f.  We  have  then  calculated  the  Hamming  distance  between  cell  attractors  and 
the  dynamical  state  of  the  network  (Subtask  1  of  Task  2). 

This  distance  has  been  used  to  identify  the  most  sensitive  single  genes  in  the  network  using  the 
following  algorithm: 

1.  Begin  with  all  genes  set  in  the  normal/cancer  state. 

2.  Force  gene  i=l  away  from  the  initial  state  and  count  the  number  of  genes  that  flip  as  a  result. 

3.  Repeat  for  i=2...N,  where  N  is  the  number  of  genes  in  the  system. 

This  algorithm  is  effective  in  identifying  bottleneck  genes.  Bottlenecks  are  genes  which,  when 
targeted  by  inhibitors,  drive  the  cell  far  away  from  its  initial  state.  We  always  try  to  target  bottlenecks 
with  ([  =  +1  and  =  —1  so  that  cancer  cells  are  driven  away  from  their  initial  state,  while  the  normal 
cells  are  left  unaltered. 

We  used  both  a  one-attractor  state  (p=l)  and  a  two-attractor  state  ( p=2 )  signaling  model.  In  the 
one  attractor  [p=l)  model  the  Jtj  only  contains  one  term  in  Eq.  (1).  Both  models  behave  like  a  simple 
Ising  magnet,  except  that  the  interactions  are  not  symmetric:  the  expression  of  gene  /  may  affect  the 
expression  of  gene  j,  but  j  does  not  necessarily  affect  /.  This  asymmetry  makes  both  the  p=l  and  p=2 
systems  more  vulnerable  to  external  control.  The  p=2  system  has  one  property  that  the  p=l  does  not 
have,  however:  all  edges  between  similarity  and  differential  genes  are  effectively  removed,  while  all 
edges  connecting  similarity  genes  to  each  other  or  differential  genes  to  each  other  remain.  The  network 
fully  separates  into  two  independent  networks,  the  similarity  network  and  the  differential  network. 
When  looking  for  nodes  to  target  in  the  p=2  case,  then,  all  similarity  nodes  can  be  safely  ignored  and  the 
problem  space  is  significantly  reduced.  Aside  from  the  edge  deletion,  however,  p=l  and  p=2  behave  very 
similarly.  An  example  of  genes  identified  by  this  method  and  their  impact  /  in  terms  of  flipped  genes  in 
the  iteractome,  is  shown  in  Table  1. 
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Part  of  the  software  was  implemented  on  the  high  performance  computer  cluster  facility  at 
MSU  (Subtask  2  of  Task  2).  The  algorithm  however  was  sufficiently  fast  that  parallelization  of  the  code 
was  not  necessary. 


I/A 

I/H 

P  =  1 

P  =  2 

P  =  1 

P  =  2 

Gene 

I 

Gene 

I 

Gene 

I 

Gene 

I 

HNF1A 

29 

OR5I1 

35 

HNF1A 

29 

HMX1 

41 

TMEM37 

22 

TMEM37 

25 

MAP3K3 

18 

PBX1 

38 

OR5I1 

20 

HNF1A 

23 

TP53 

18 

MYB 

25 

UNC 

MAP3K14 

19 

POSTN 

21 

RUNX1 

17 

ITGB2 

20 

MAP3K3 

18 

RORA 

18 

RORA 

16 

TNFRSF10A 

18 

CON 

MAP3K14 

19 

SRC 

15 

TTN 

16 

BMPR1B 

18 

SRC 

14 

BMPR1B 

7 

RIPK3 

6 

LCK 

8 

Table  1.  Representative  genes  to  be  targeted  for  a  selective  killing  of  A549  (left)  and  H358  (right)  cell  line  versus  a 
control  IMR90  cell  line.  The  impact  I  of  each  gene  for  the  $p$=l  and  $p$=2  models  were  calculated  and  ranked. 

The  constrained  case  (CON  in  the  table)  refers  to  target  that  are  kinases  and  are  expressed  in  the  cancer  case.  The 
calculation  is  based  on  the  selective  response  of  I  =  IMR-90  (normal),  A  =  A549  (cancer),  and  H358  (cancer). 

3.  TASK  3  of  SOW:  First  set  of  experiments  at  the  high-throughput  screening  facility 

We  have  carried  out  a  first  high-throughput  screening  of  single  drug  and  drug  pair  experiments 
(Subtask  1  of  Task  3).  The  original  SOW  only  included  single  drug  response,  but  we  realized  that  a 
screening  with  pairs  would  give  better  selectivity.  244  kinase  inhibitors  (KIs)  of  the  EMD  drug  library 
were  screened  at  lOOOnM  individually  and  the  treatment  lasted  for  72  hours.  To  quantify  a  selective 
response  of  a  cancer  cell  line  with  respect  to  a  control  normal  cell  line,  we  define  the  selectivity  5  of  a 
single  drug  or  drug  combination  as 


vc 


where  vN  indicates  the  viability  of  normal  cells  (IMR90)  after  treatment,  and  vc  the  viability  of  cancer 
cells  (A549)  after  treatment.  From  the  screening  of  the  244  KIs,  the  top  hit  was  PDK1/Aktl/Flt3  Dual 
Pathway  Inhibitor  (CAS  #  331253-86-2)  as  ranked  by  selectivity.  For  the  secondary  screen  (pair 
combination  of  drugs),  we  used  the  PDK1/Aktl/Flt3  Dual  Pathway  Inhibitor  as  the  starting  point  and 
combined  this  compound  with  the  other  KIs  as  a  drug  pair  combination.  The  dose  of  PDK1/Aktl/Flt3 
Dual  Pathway  Inhibitor  was  studied  to  ensure  proper  dosing  range  and  minimize  toxicity.  We  used 
125nM,  which  maintains  the  normal  cell  line  IMR-90's  viability  >90%.  For  the  other  243  KIs  we  used  the 
standard  dose  of  lOOOnM.  Several  pairs  in  the  secondary  screen  showed  very  high  selectivity.  The  top 
hit  from  the  secondary  screen  of  the  library  was  Alsterpaullone  2-cyanoethyl  (CAS  #  852529-97-0)  with  a 
selectivity  of  S=  6.14  for  the  pair  (see  Figure  2). 
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Drug  Pair's  Selectivity  for  A549 


A12  D17  K08  021  P15  E13  C09  A10  003  N05 


Drug  Pairs 


Figure  2:  Representative  data  from  TASK  3  in  SOW.  Experimental  results  of  the  top  ten  most  selective  drugs 
(lOOOnM)  when  paired  with  PDK1/Aktl/Flt3  Dual  Pathway  Inhibitor  at  125nM.  Selectivity  is  the  IMR-90  to  A549 
viability  ratio.  The  3  digit  codes  identify  the  compounds:  A12:  Alsterpaullone,  2-Cyanoethyl  (CAS  852529-97-0); 
D17:  Cdk2/9  Inhibitor  (CAS  507487-89-0);  K08:  K-252a,  Nocardiopsis  sp.  (CAS  97161-97-2);  021:  Staurosporine, 
Streptomyces  sp.  (CAS  62996-74-1);  P15:  WHI-P180,  Hydrochloride  (CAS  211555-08-7);  E13:  Go  6976  (CAS  136194- 
77-9);  C09:  Compound  56  (CAS  171745-13-4);  A10:  Alsterpaullone  (CAS  237430-03-4);  003:  AG  1478,  Selective 
inhibitor  of  epidermal  growth  factor  receptor  (EGFR)  protein  (CAS  175178-82-2);  N05:  Reversine  (CAS  656820-32- 
5). 


We  have  also  carried  out  measurements  on  random  combinations  of  drugs  (Subtask  2  of  Task 
3)  including  compounds  from  the  EMD  library  and  other  drugs.  A  representative  data  set  of  random 
combinations  is  given  in  Table  2. 


K04 

A12 

A15 

E03 

111 

628 

AAG 

263 

MK2 

662 

535 

Type 

IMR90/A549 

Selectivity 

0 

2 

3 

0 

4 

0 

0 

1 

0 

0 

0 

R 

3.46 

1 

2 

3 

0 

3 

0 

0 

2 

0 

0 

1 

R 

3.81 

1 

3 

0 

4 

1 

1 

1 

0 

0 

1 

0 

R 

1.71 

2 

0 

4 

4 

1 

1 

0 

1 

2 

0 

0 

R 

3.40 

4 

3 

3 

1 

0 

3 

0 

0 

0 

1 

0 

R 

4.90 

3 

3 

2 

4 

0 

1 

0 

0 

0 

2 

1 

R 

1.17 

3 

2 

2 

1 

1 

3 

0 

1 

0 

0 

0 

R 

3.12 

4 

3 

2 

4 

3 

1 

1 

0 

0 

1 

1 

R 

2.11 

0 

4 

3 

1 

3 

1 

1 

1 

1 

0 

0 

R 

1.82 

1 

3 

4 

0 

2 

1 

0 

1 

0 

2 

0 

R 

7.40 

Table  2.  Representative  data  with  measurements  of  selectivity  on  A549  cells  versus  IMR90.  Drugs  were  combined 
at  different  doses  ranked  from  0  to  4.  The  drug  combinations  obtained  in  this  table  were  obtained  randomly. 
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4.  TASK  4  of  SOW:  Running  and  analysis  of  simulations 

We  have  run  simulations  to  predict  the  therapeutic  effectiveness  of  combinations  of  kinase 
inhibitors  according  to  the  attractor  model  (Subtask  1  of  Task  4).  We  have  tested  our  model  against  the 
experimental  results  discussed  in  the  previous  section.  We  used  the  available  kinase  inhibition  profiling 
for  a  drug  library  to  determine  which  kinase  are  shut  off  by  each  drug.  We  applied  the  drugs  to  both  the 
normal  and  cancer  cells  for  both  p=l  and  p=2,  and  compared  the  resulting  viabilities  from  the 
experiment,  vexp,  to  the  model, 


vmodel~e 


where  "m"  is  the  magnetization  of  the  system  along  the  attractor  state  (see  Figure  3). 

Note  that  the  results  for  p=l  and  p=2  are  roughly  the  same,  and  only  the  p=l  result  is  shown. 
The  black  circles  indicate  the  viability  of  the  normal  cells  for  a  given  drug  combination,  which  is  the  drug 
A15  (a  PDK1/AKT1/FLT3  Inhibitor)  and  the  drug  code  next  to  the  black  circles,  and  the  connected  red  x's 
are  the  cancer  viabilities  for  the  same  drug  combination.  This  shows  only  some  of  the  140  drugs  tested. 

The  most  remarkable  result  is  that  without  any  kind  of  fitting,  ~95%  of  the  blue  lines  (including  those 
not  pictured)  have  a  positive  slope,  meaning  that  if  the  experiment  showed  that  the  normal/cancer 
cells  fared  better  than  the  cancer/normal  cells,  our  model  showed  that  as  well.  Currently  we  cannot 
reproduce  the  rank  of  the  effectiveness  of  the  drug  combinations,  but  we  can  quite  accurately  predict 
whether  a  combination  will  have  a  selectivity  greater  than  or  less  than  1. 


Figure  3.  Computational  versus  experimental  viability  for  IMR90  and  A549.  All  drug  codes  shown  are  combined 
with  A15  (a  PDK1/AKT1/FLT3  Inhibitor).  The  experimental  results  are  compared  with  the  p=l  model  predictions 
(p=2  is  similar).  A  positive  slope  means  that  there  is  positive  correlation  between  the  experimental  and  model 
results:  the  experiment  showed  that  normal  cells  treated  with  (A15+016),  for  example,  fared  better  than  cancer 
cells  treated  with  the  same  drugs,  which  our  model  predicts  as  well.  Note  that  while  only  11  drug  combinations  are 
shown,  140  were  tested,  a  promising  95%  of  which  had  a  positive  slope. 
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We  have  examined  combinations  that  are  more  effective  using  a  learning  machine  method 
known  as  elastic  net  regression  (Subtask  2  of  Task  4).  The  method  uses  the  in  vitro  lung  cancer  A549  cell 
line  response  of  single  drugs  and  drug  pair  combinations  as  a  training  set  to  build  a  regression  model. 
Besides  predicting  the  effectiveness  of  untested  drugs,  the  method  identifies  sets  of  kinases  that  are 
statistically  associated  to  drug  sensitivity  in  lung  cancer.  More  specifically,  we  built  a  regression  model 
that  predicts  the  response  of  a  cell  line  to  a  drug  or  drug  combination  i.  The  response  we  predict  is  the 
normal  and  cancer  cell  viability,  from  which  the  selectivity  can  be  derived.  For  this  purpose,  we  define  a 
regression  problem  in  which  we  use  the  residual  activity  of  the  kinase  k  under  the  effect  of  drug  r,  which 
we  indicate  as  Ak  i,  as  predictors  of  the  viability.  The  response  can  be  written  as 

vi  =  P o  +  Pi^i ,i  +  — F  PpApi .  (2) 

A  fitting  procedure  based  on  a  training  set  of  measurements  produces  the  coefficients  (/?0,/?i,  -,Pp)- 
Equation  (2)  can  then  be  used  to  predict  the  viability  of  a  new  drug  that  has  not  been  tested,  but  of 
which  the  profiling  information  is  available.  The  coefficients  /?fe  provide  a  measure  of  the  sensitivity  of  a 
given  cell  line  due  to  alterations  in  the  activity  of  kinase  k.  The  method  was  applied  to  the  A549  lung 
cancer  cell  line,  and  we  identified  specific  kinases  known  to  have  an  important  role  in  this  type  of  cancer 
(TGFBR2,  EGFR,  PHKG1  and  CDK4).  A  pathway  enrichment  analysis  of  the  set  of  kinases  identified  by  the 
method  showed  that  axon  guidance,  activation  of  Rac,  and  semaphorin  interactions  pathways  are 
associated  to  a  selective  response  to  therapeutic  intervention  in  this  cell  line. 

The  formulation  of  the  problem  in  terms  of  bottlenecks ,  as  defined  above  has  allowed  us  to 
derive  some  general  properties  on  the  controllability  of  a  cellular  network  based  on  the  mathematical 
properties  of  network  topology.  In  particular,  in  the  first  publication,  we  have  demonstrated  a  theorem 
with  bounds  on  the  minimum  number  of  nodes  that  guarantee  control  of  bottlenecks  consisting  of 
strongly  connected  components  (Subtask3  of  Task  4).  This  information  could  be  important  in  deciding 
the  number  of  proteins  than  needs  to  be  targeted  for  an  efficient  therapy.  We  have  also  compared  the 
controllability  of  the  lung  cancer  network  with  the  controllability  of  an  algorithmically-assembled 
specific  B  cell  gene  regulatory  network  reconstructed  from  gene  expression  data  (Subtask3  of  Task  4).  In 
contrast  to  the  lung  cancer  network,  which  is  generic  but  experimentally  validated,  the  B  cell  network 
has  a  much  higher  connectivity,  and  is  harder  to  control. 

5.  TASK  5  of  SOW:  Second  set  of  experiments  and  test  of  hypothesis 

We  have  carried  out  measurement  of  drug  response  of  cells  under  combinations  involving  up  to 
10  drugs  (Subtask  1).  We  have  included  drugs  that  were  identified  using  the  KIEN  method  above  and 
we  used  a  dose  optimization  method.  Cell  survival  was  assessed  by  luciferase-based  assay,  ATPlite™ 
(PerkinElmer,  CA,  USA),  which  determines  viable  cell  numbers  by  measuring  the  presence  of  ATP  in  all 
metabolically  active  cells.  For  the  measurement  of  cell  viability,  A549  and  IMR-90  cells  were  plated  in 
384-well  plates.  Subsequently,  the  cells  were  treated  with  the  drugs  and  72  hours  later,  the  ATPlite 
assay  was  performed  according  to  the  manufacturer's  protocol,  and  luminescence  was  read  with  an 
Analyst  HT  instrument.  Each  combination  was  measured  in  triplicates. 


Table  3  shows  representative  data  with  results  of  the  measurements.  Some  of  the  combinations 
reduce  the  viability  of  cancer  cells  almost  to  zero,  still  significantly  preserving  the  viability  of  IMR90. 
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K04 

A12 

A15 

E03 

111 

628 

AAG 

263 

MK2 

662 

535 

IMR90 

Selectivity 

A549 

3 

1 

4 

1 

2 

3 

3 

1 

1 

1 

1 

0.6162 

297.7349 

0.0021 

2 

2 

4 

2 

2 

3 

3 

1 

1 

1 

2 

0.7287 

281.8844 

0.0026 

2 

2 

4 

1 

2 

1 

3 

1 

1 

1 

2 

0.7257 

273.3291 

0.0027 

2 

1 

4 

1 

2 

2 

3 

1 

1 

2 

2 

0.6719 

244.5041 

0.0027 

1 

1 

3 

1 

2 

3 

3 

1 

1 

2 

2 

0.5578 

225.7526 

0.0025 

2 

2 

4 

2 

2 

3 

4 

1 

1 

1 

2 

0.7177 

221.7178 

0.0032 

2 

1 

4 

1 

2 

2 

3 

1 

3 

2 

1 

0.5110 

216.8450 

0.0024 

2 

1 

4 

1 

2 

2 

3 

1 

1 

2 

1 

0.5600 

213.5330 

0.0026 

2 

2 

4 

4 

2 

4 

3 

1 

1 

1 

2 

0.5800 

210.7142 

0.0028 

2 

1 

4 

3 

3 

3 

3 

1 

1 

1 

2 

0.5616 

205.5397 

0.0027 

Table  3:  Representative  data  with  measurements  of  the  highest  selectivity  on  A549  cells  versus  IMR90.  Drugs  were 
combined  at  different  doses  ranked  from  0  to  4.  Columns  labeled  IMR90  and  A549  indicate  the  viability  of  these 
cell  lines  treated  with  the  combination,  while  selectivity  is  the  ratio  of  the  normal  versus  cancer  viability. 


14 


Conclusions 

In  this  early  concepts  project,  we  have  applied  mathematical  tools  of  networks  theory  and 
nonlinear  dynamics  to  the  problem  of  identifying  drug  combinations  that  could  be  more  effecting  in  the 
treatment  of  lung  cancer.  In  particular,  we  have  developed  software  able  to  calculate  the  effect  of  drug 
combinations  on  the  signaling  of  A549  adenocarcinoma,  H358  non-small  lung  cancer,  and  IMR90 
fibroblast  normal  cell  lines  (Milestone  1).  Two  articles  on  the  computational  and  theoretical  results  on 
controllability  of  cancer  networks  and  identification  of  target  genes  in  lung  cancer  have  been  published 
(Milestone  2).  Experimentally,  we  found  drugs  combinations  with  up  to  10  drugs  that  are  very  effective 
in  killing  A549  cells  versus  the  control  IMR90  cells  in  an  in-vitro  setting  (Milestone  3). 
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Abstract 

The  asymmetric  Hopfield  model  is  used  to  simulate  signaling  dynamics  in  gene  regulatory  networks.  The  model  allows  for  a 
direct  mapping  of  a  gene  expression  pattern  into  attractor  states.  We  analyze  different  control  strategies  aimed  at 
disrupting  attractor  patterns  using  selective  local  fields  representing  therapeutic  interventions.  The  control  strategies  are 
based  on  the  identification  of  signaling  bottlenecks,  which  are  single  nodes  or  strongly  connected  clusters  of  nodes  that 
have  a  large  impact  on  the  signaling.  We  provide  a  theorem  with  bounds  on  the  minimum  number  of  nodes  that  guarantee 
control  of  bottlenecks  consisting  of  strongly  connected  components.  The  control  strategies  are  applied  to  the  identification 
of  sets  of  proteins  that,  when  inhibited,  selectively  disrupt  the  signaling  of  cancer  cells  while  preserving  the  signaling  of 
normal  cells.  We  use  an  experimentally  validated  non-specific  and  an  algorithmically-assembled  specific  B  cell  gene 
regulatory  network  reconstructed  from  gene  expression  data  to  model  cancer  signaling  in  lung  and  B  cells,  respectively. 
Among  the  potential  targets  identified  here  are  TP53,  FOXM1,  BCL6  and  SRC.  This  model  could  help  in  the  rational  design  of 
novel  robust  therapeutic  interventions  based  on  our  increasing  knowledge  of  complex  gene  signaling  networks. 
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Introduction 

The  vision  behind  systems  biology  is  that  complex  interactions 
and  emergent  properties  determine  the  behavior  of  biological 
systems.  Many  theoretical  tools  developed  in  the  framework  of 
spin  glass  models  are  well  suited  to  describe  emergent  properties, 
and  their  application  to  large  biological  networks  represents  an 
approach  that  goes  beyond  pinpointing  the  behavior  of  a  few 
genes  or  metabolites  in  a  pathway.  The  Hopfield  model  [1]  is  a 
spin  glass  model  that  was  introduced  to  describe  neural  networks, 
and  that  is  solvable  using  mean  field  theory  [2].  The  asymmetric 
case,  in  which  the  interaction  between  the  spins  can  be  seen  as 
directed,  can  also  be  exacty  solved  in  some  limits  [3] .  The  model 
belongs  to  the  class  of  attractor  neural  networks,  in  which  the  spins 
evolve  towards  stored  attractor  patterns,  and  it  has  been  used  to 
model  biological  processes  of  high  current  interest,  such  as  the 
reprogramming  of  pluripotent  stem  cells  [4],  Moreover,  it  has 
been  suggested  that  a  biological  system  in  a  chronic  or  therapy- 
resistant  disease  state  can  be  seen  as  a  network  that  has  become 
trapped  in  a  pathological  Hopfield  attractor  [5] .  A  similar  class  of 
models  is  represented  by  Random  Boolean  Networks  [6],  which 
were  proposed  by  Kauffman  to  describe  gene  regulation  and 
expression  states  in  cells  [7],  Differences  and  similarities  between 


the  Kauffman-type  and  Hopfield-type  random  networks  have 
been  studied  for  many  years  [8-11], 

In  this  paper,  we  consider  an  asymmetric  Hopfield  model  built 
from  real  (even  if  incomplete  [12,13])  cellular  networks,  and  we 
map  the  spin  attractor  states  to  gene  expression  data  from  normal 
and  cancer  cells.  We  will  focus  on  the  question  of  controling  of  a 
network’s  final  state  (after  a  transient  period)  using  external  local 
fields  representing  therapeutic  interventions.  To  a  major  extent, 
the  final  determinant  of  cellular  phenotype  is  the  expression  and 
activity  pattern  of  all  proteins  within  the  cell,  which  is  related  to 
levels  of  mRNA  transcripts.  Microarrays  measure  genome-wide 
levels  of  mRNA  expression  that  therefore  can  be  considered  a 
rough  snapshot  of  the  state  of  the  cell.  This  state  is  relatively  stable, 
reproducible,  unique  to  cell  types,  and  can  differentiate  cancer 
cells  from  normal  cells,  as  well  as  differentiate  between  different 
types  of  cancer  [14,15].  In  fact,  there  is  evidence  that  attractors 
exist  in  gene  expression  states,  and  that  these  attractors  can  be 
reached  by  different  trajectories  rather  than  only  by  a  single 
transcriptional  program  [16].  While  the  dynamical  attractors 
paradigm  has  been  originally  proposed  in  the  context  of  cellular 
developement,  the  similarity  between  cellular  ontogenesis,  i.e.  the 
developement  of  different  cell  types,  and  oncogenesis,  i.e.  the 
process  under  which  normal  cells  are  transformed  into  cancer 
cells,  has  been  recently  emphasized  [17].  The  main  hypothesis  of 


PLOS  ONE  |  www.plosone.org 


1 


August  2014  |  Volume  9  |  Issue  8  |  e105842 


Hopfield  Networks  and  Cancer  Attractors 


this  paper  is  that  cancer  robustness  is  rooted  in  the  dynamical 
robustness  of  signaling  in  an  underlying  cellular  network.  If  the 
cancerous  state  of  rapid,  uncontrolled  growth  is  an  attractor  state 
of  the  system  [18],  a  goal  of  modeling  therapeutic  control  could  be 
to  design  complex  therapeutic  interventions  based  on  drug 
combinations  [19]  that  push  the  cell  out  of  the  cancer  attractor 
basin  [20]. 

Many  authors  have  discussed  the  control  of  biological  signaling 
networks  using  complex  external  perturbations.  Calzolari  and 
coworkers  considered  the  effect  of  complex  external  signals  on 
apoptosis  signaling  [21].  Agoston  and  coworkers  [22]  suggested 
that  perturbing  a  complex  biological  network  with  partial 
inhibition  of  many  targets  could  be  more  effective  than  the 
complete  inhibition  of  a  single  target,  and  explicitly  discussed  the 
implications  for  multi-drug  therapies  [23].  In  the  traditional 
approach  to  control  theory  [24],  the  control  of  a  dynamical  system 
consists  in  finding  the  specific  input  temporal  sequence  required  to 
drive  the  system  to  a  desired  output.  This  approach  has  been 
discussed  in  the  context  of  Kauffmann  Boolean  networks  [25]  and 
their  attractor  states  [26].  Several  studies  have  focused  on  the 
intrinsic  global  properties  of  control  and  hierarchical  organization 
in  biological  networks  [27,28],  A  recent  study  has  focused  on  the 
minimum  number  of  nodes  that  needs  to  be  addressed  to  achieve 
the  complete  control  of  a  network  [29].  This  study  used  a  linear 
control  framework,  a  matching  algorithm  [30]  to  find  the 
minimum  number  of  controllers,  and  a  replica  method  to  provide 
an  analytic  formulation  consistent  with  the  numerical  study. 
Finally,  Cornelius  et  al.  [31]  discussed  how  nonlinearity  in  network 
signaling  allows  reprogrammig  a  system  to  a  desired  attractor  state 
even  in  the  presence  of  contraints  in  the  nodes  that  can  be 
accessed  by  external  control.  This  novel  concept  was  explicitly 
applied  to  a  T-cell  survival  signaling  network  to  identify  potential 
drug  targets  in  T-LGL  leukemia.  The  approach  in  the  present 
paper  is  based  on  nonlinear  signaling  rules  and  takes  advantage  of 
some  useful  properties  of  the  Hopfield  formulation.  In  particular, 
by  considering  two  attractor  states  we  will  show  that  the  network 
separates  into  two  types  of  domains  which  do  not  interact  with 
each  other.  Moreover,  the  Hopfield  framework  allows  for  a  direct 
mapping  of  a  gene  expression  pattern  into  an  attractor  state  of  the 
signaling  dynamics,  facilitating  the  integration  of  genomic  data  in 
the  modeling. 

The  paper  is  structured  as  follows.  In  Mathematical  Model  we 
summarize  the  model  and  review  some  of  its  key  properties. 
Control  Strategies  describes  general  strategies  aiming  at  selectively 
disrupting  the  signaling  only  in  cells  that  are  near  a  cancer 
attractor  state.  The  strategies  we  have  investigated  use  the  concept 
of  bottlenecks,  which  identify  single  nodes  or  strongly  connected 
clusters  of  nodes  that  have  a  large  impact  on  the  signaling.  In  this 
section  we  also  provide  a  theorem  with  bounds  on  the  minimum 
number  of  nodes  that  guarantee  control  of  a  bottleneck  consisting 
of  a  strongly  connected  component.  This  theorem  is  useful  for 
practical  applications  since  it  helps  to  establish  whether  an 
exhaustive  search  for  such  minimal  set  of  nodes  is  practical.  In 
Cancer  Signaling  we  apply  the  methods  from  Control  Strategies  to 
lung  and  B  cell  cancers.  We  use  two  different  networks  for  this 
analysis.  The  first  is  an  experimentally  validated  and  non-specific 
network  (that  is,  the  observed  interactions  are  compiled  from 
many  experiments  conducted  on  heterogeneous  cell  types) 
obtained  front  a  kinase  interactome  and  phospho-protein  database 
[32]  combined  with  a  database  of  interactions  between  transcrip¬ 
tion  factors  and  their  target  genes  [33] .  The  second  network  is  cell- 


specific  and  was  obtained  using  network  reconstruction  algorithms 
and  transcriptional  and  post-translational  data  from  mature 
human  B  cells  [34],  The  algorithmically  reconstructed  network 
is  significantly  more  dense  than  the  experimental  one,  and  the 
same  control  strategies  produce  different  results  in  the  two  cases. 
Finally,  we  close  with  Conclusions. 

Methods 

Mathematical  Model 

We  define  the  adjacency  matrix  of  a  network  G  composed  of  N 
nodes  as 


fl  if  j-+i 
\  0  otherwise  ’ 


(1) 


where  j-*i  denotes  a  directed  edge  from  node  j  to  node  i.  The  set 
of  nodes  in  the  network  G  is  indicated  by  V(G)  and  the  set  of 
directed  edges  is  indicated  by  E(G)  =  {(j,i)  :  /—>/}.  (See  Table  1 
for  a  list  of  mathematical  symbols  used  in  the  text.)  The  spin  of 
node  i  at  time  t  is  ft,  ( t)  =  +  1 ,  and  indicates  an  expresssed  ( +  1 )  or 
not  expressed  (—1)  gene.  We  encode  an  arbitrary  attractor  state 
£  =  (JI\,Z2,—,£n)  with  Z,i=  +  1  by  defining  the  coupling  matrix  [1] 


Jo-  -'itiiiij- 


(2) 


The  total  field  at  node  i  is  then  hi  =  hfLt  +  JT  JjjOj,  where  hexX  is 
the  external  field  applied  to  node  i,  which  will  be  discussed  below. 
The  discrete-time  update  scheme  is  defined  as 


ff/(f  +  Af)  = 


+  1 
-1 


with  prob.  (1+exp [  —  hj(t)/T])  1 
with  prob.  (1 +exp[  +  /!,(0/T])_1 


(3) 


where  T>  0  is  an  effective  temperature.  For  the  remainder  of  the 
paper,  we  consider  the  case  of  T  =  0  so  that  ft,-  =  signf/t,),  and  the 
spin  is  chosen  randomly  from  +  1  if  =  0.  For  convenience,  we 
take  te Z  and  A/=l.  Nodes  can  be  updated  synchronously,  and 
synchronous  updating  can  lead  to  limit  cycles  [9].  Nodes  can  also 
be  updated  separately  and  in  random  order  (anynchronous 
updating),  which  does  not  result  in  limit  cycles.  All  results 
presented  in  this  paper  use  the  synchronous  update  scheme. 

Source  nodes  (nodes  with  zero  indegree)  are  fixed  to  their  initial 
states  by  a  small  external  field  so  that  aq(t)  =  aq(0)  for  all  qeQ, 
where  Q  is  the  set  of  source  nodes.  However,  the  source  nodes  flip 
if  directly  targeted  by  an  external  field.  Biologically,  genes  at  the 
“top”  of  a  network  are  assumed  to  be  controlled  by  elements 
outside  of  the  network. 

In  application,  two  attractors  are  needed.  Define  these  states  as 
and  the  normal  state  and  cancer  state,  respectively.  The 
magnetization  along  attractor  state  a  is 


ma(t)  = 


hi*™- 


(4) 


Note  that  if  ma(t)  =  + 1,  <i(t)  =  ±  <?“.  We  also  define  the  steady 
state  magnetization  along  state  a  as 
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Table  1.  Reference  table  for  symbols. 


Symbol  Explanation 


deg+/~(0  Outdegree/indegree  of  node  i 

(jj  Spin  of  node  =  + 1 

C  ath  attractor 


This  table  lists  all  important  symbols  introduced  in  the  article  with  a  brief 

explanation  of  its  purpose. 

doi:1 0.1 371/journal.pone.0105842.t001 


'<=  lim  (5) 

T  — >  CO  X  Z J 


There  are  two  ways  to  model  normal  and  cancer  cells.  One  way 
is  to  simply  define  a  different  coupling  matrix  for  each  attractor 
state  a, 


(6) 


(7) 


Systems  using  Eqs.  6  and  7  will  be  referred  to  as  the  one 
attractor  state  (p  =  1)  and  two  attractor  state  (p  =  2)  systems, 
respectively.  Eqs.  6  and  7  are  particular  cases  of  the  general 
Hopfield  form  [1] 


(8) 

At=1 

where  p  is  the  number  of  attractor  states,  often  taken  to  be  large. 
An  interesting  property  emerges  when  p  =  2,  however.  Consider  a 
simple  network  composed  of  two  nodes,  with  only  one  edge  1  — >2 
with  attractor  states  tf  and  If,  and  T  =  0.  The  only  nonzero  entry 
of  the  matrix  Jy  is 


J2l=^  +  &cv  (9) 

Note  that  if  Z"  =  +Z°,  Ji\=- Z"Z\-  In  either  case,  by  Eq.  3  we 
have 


if  ffl(/)  =  +  i1 

if  <7,(0= 


(10) 


that  is,  the  spin  of  node  2  at  a  given  time  step  will  be  driven  to 
match  the  attractor  state  of  node  1  at  the  previous  time  step. 
However,  if  Z'l  =  +  Z\  and  Z2  =  +  S2 >  *^21  =0-  This  gives 


ff2(0  = 


+  1 

-1 


with  probability  I/2 
with  probability  V2 


(11) 


In  this  case,  node  2  receives  no  input  from  node  1 .  Nodes  1  and  2 
have  become  effectively  disconnected. 

This  motivates  new  designations  for  node  types.  We  define 
similarity  nodes  as  nodes  with  Z"  —  ^,  and  differential  nodes 
as  nodes  with  £”=—  Zf  We  also  define  the  set  of  similarity 
nodes  S={i  :  and  the  set  of  differential  nodes  D  =  {i  :  Zf 

=  —£?}.  Connections  between  two  similarity  nodes  or  two 
differential  nodes  remain  in  the  network,  whereas  connections 
that  link  nodes  of  different  types  transmit  no  signals.  The  effective 
deletion  of  edges  between  nodes  means  that  the  original  network 
fully  separates  into  two  subnetworks:  one  composed  entirely  of 
similarity  nodes  (the  similarity  network )  and  another  composed 
entirely  of  differential  nodes  (the  differential  network ),  each  of 
which  can  be  composed  of  one  or  more  separate  weakly  connected 
components  (see  Fig.  1).  With  this  separation,  new  source  nodes 
{effective  sources )  can  be  exposed  in  both  the  similarity  and 
differential  networks.  For  the  remainder  of  this  article,  Q  is  the  set 
of  both  source  and  effective  source  nodes  in  a  given  network. 


Alternatively,  both  attractor  states  can  be  encoded  in  the  same 
coupling  matrix, 


Control  Strategies 

The  strategies  presented  below  focus  on  selecting  the  best  single 
nodes  or  small  clusters  of  nodes  to  control,  ranked  by  how  much 
they  individually  change  ma  .  In  application,  however,  controlling 
many  nodes  is  necessary  to  achieve  a  sufficiently  changed  mf. 
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The  effects  of  controlling  a  set  of  nodes  can  be  more  than  the  sum 
of  the  effects  of  controlling  individual  nodes,  and  predicting  the 
truly  optimal  set  of  nodes  to  target  is  computationally  difficult. 
Here,  we  discuss  heuristic  strategies  for  controlling  large  networks 
where  the  combinatorial  approach  is  impractical. 

For  both  p  =  1  and  p  =  2,  simulating  a  cancer  cell  means  that 
ff(0)  =  +  If ,  and  likewise  for  normal  cells.  Although  the  normal 
and  cancer  states  are  mathematically  interchangeable,  biologically 
we  seek  to  decrease  m £,  as  much  as  possible  while  leaving 
m  «  +  1.  By  “network  control”  we  thus  mean  driving  the  system 
away  from  its  initial  state  of  ff(0)  =  ^c  with  /iext.  Controlling 
individual  nodes  is  achieved  by  applying  a  strong  field  (stronger 
than  the  magnitude  of  the  field  due  to  the  node’s  upstream 
neighbors)  to  a  set  of  targeted  nodes  T  so  that 

(lim^oo  )-!<  reT 

hf=\  ■  (12) 

[  0  else 

This  ensures  that  the  drug  field  can  always  overcome  the  field 
from  neighboring  nodes. 

In  application,  similarity  nodes  are  never  deliberately  directly 
targeted,  since  changing  their  state  would  adversely  affect  both 
normal  and  cancer  cells.  Roughly  70%  of  the  nodes  in  the 
networks  surveyed  are  similarity  nodes,  so  the  search  space  is 
reduced.  For/>  =  2,  the  effective  edge  deletion  means  that  only  the 
differential  network  in  cancer  cells  needs  to  be  simulated  to 
determine  the  effectiveness  of  /lext.  For p=l,  however,  there  may 
be  some  similarity  nodes  that  receive  signals  from  upstream 
differential  nodes.  In  this  case,  the  full  effect  of  hext  can  be 
determined  only  by  simulating  all  differential  nodes  as  well  as  any 
similarity  nodes  downstream  of  differential  nodes.  ,411  following 
discussion  assumes  that  all  nodes  examined  are  differential,  and 
therefore  targetable,  for  both  p=  1  and  p  =  2.  The  existence  of 
similarity  nodes  for  p  =  1  only  limits  the  set  of  targetable  nodes. 


Directed  acyclic  networks.  Full  control  of  a  directed  acyclic 
network  is  achieved  by  forcing  aq=—  £,c  for  all  qsQ-  This 
guarantees  mca>  =  —  l.  Suppose  that  nodes  qsQ  in  an  acyclic 
network  have  always  been  fixed  away  from  the  cancer  state,  that  is, 
oq(t-*  —  co)  =  —  Zf  For  any  node  i  to  have  <Ji(t)  =  it  is 
sufficient  to  have  either  ieQ  or  1)  =  £J  for  all  j-*i,  i^Q- 

Because  there  are  no  cycles  present,  all  upstream  paths  of  sufificent 
length  terminate  at  a  source.  Because  the  spin  of  all  nodes  qeQ 
point  away  from  the  cancer  attractor  state,  all  nodes  downstream 
must  also  point  away  from  the  cancer  attractor  state.  Thus,  for 
acyclic  networks,  forcing  aq  =  —  £,c  guarantees  mcm  =  —  1 .  The 
complications  that  arise  from  cycles  are  discussed  in  the  next 
subsubsection.  However,  controlling  nodes  in  Q  may  not  be  the 
most  efficient  way  to  push  the  system  away  from  the  cancer  basin 
of  attraction  and,  depending  on  the  control  limitations,  it  may  not 
be  possible.  If  minimizing  the  number  of  controllers  is  required, 
searching  for  the  most  important  bottlenecks  is  a  better  strategy. 

Consider  a  directed  network  G  and  an  initially  identical  copy, 
G'  =  G.  If  removing  node  i  (and  all  connections  to  and  from  /) 
from  G'  decreases  the  indegree  of  at  least  one  node  jeV(G'),  j  #/, 
to  less  than  half  of  its  indegree  in  network  G,  {/}  is  a  size  1 
bottleneck.  The  bottleneck  control  set  of  bottleneck  {/},  L(i),  is 
defined  algorithmically  as  follows:  (1)  Begin  a  set  L(i)  with  die 
current  bottleneck  /  so  that  L  =  {i};  (2)  Remove  bottleneck  {/} 
from  network  G'\  (3)  Append  L(i)  with  all  nodes  j  with  current 
indegree  that  is  less  than  half  of  that  from  the  original  network  G; 
(4)  Remove  all  nodes  j  from  the  network  G' .  If  additional  nodes  in 
G'  have  their  indegree  reduced  to  below  half  of  their  indegree  in 
G,  go  to  step  3.  Otherwise,  stop.  The  impact  of  the  bottleneck  i,  /(/), 
is  defined  as 

m =\m\,  (i3) 

where  \X\  is  the  cardinality  of  the  set  X .  The  impact  of  a 
bottleneck  is  the  minimum  number  of  nodes  that  are  guaranteed 
to  switch  away  from  the  cancer  state  when  the  bottleneck  is  forced 
away  from  the  cancer  state. 


Similarity  node 
Differential  node 


Figure  1.  Network  segregation  for  two  attractor  states  (p  =  2).  Every  edge  that  connects  a  similarity  node  to  a  differential  node  or  a  differential 
node  to  a  similarity  node  transmits  no  signal.  This  means  that  the  signaling  in  the  right  network  shown  above  is  identical  to  that  of  the  left  network. 
Because  the  goal  is  to  leave  normal  cells  unaltered  while  damaging  cancer  cells  as  much  as  possible,  all  similarity  nodes  can  be  safely  ignored,  and 
searches  and  simulations  only  need  to  be  done  on  the  differential  subnetwork. 
doi:1 0.1 371  /journal,  pone.01 05842.g001 
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The  impact  is  used  to  rank  the  size  1  bottlenecks  by  importance, 
with  the  most  important  as  those  with  the  largest  impact.  In 
application,  when  searching  for  nodes  to  control,  any  size  1 
bottleneck  { z}  that  appears  in  the  bottleneck  control  set  of  a 
different  size  1  bottleneck  {/’}  can  be  ignored,  since  fixing  j  to  the 
normal  state  fixes  i  to  the  normal  state  as  well.  Note  that  the 
definition  given  above  in  terms  of  G  and  G'  avoids  miscounting  in 
the  impact  of  a  bottleneck. 

The  network  in  Fig.  2,  for  example,  has  three  sources  (nodes  1, 
2  and  3),  but  one  important  bottleneck  (node  6).  If  full  damage,  i.e. 
mca0  =  —  1,  is  required,  then  control  of  all  source  nodes  is 
necessary.  If  minimizing  the  number  of  directly  targeted  nodes  is 
important  and  mc  >  —  1  can  be  tolerated,  then  control  of  the 
bottleneck  node  6  is  a  better  choice. 

Directed  cycle-rich  networks.  Not  all  networks  can  be  fully 
controlled  at  T  =  0  by  controlling  the  source  nodes,  however.  If 
there  is  a  cycle  present,  paths  of  infinite  length  exist  and  the  final 
state  of  the  system  may  depend  on  the  initial  state,  causing  parts  of 
the  network  to  be  hysteretic.  Controlling  only  sources  in  a  general 
directed  network  thus  does  not  guarantee  mf  =  —  1  unless  the 
system  begins  with  07  =  —  Sf. 

Define  a  cycle  cluster ,  C,  as  a  strongly  connected  subnetwork  of 
a  network  G.  The  network  in  Fig.  3,  for  example,  has  one  cycle 
cluster  with  nodes  F(C)  =  {4,5, 6, 7}.  If  the  network  begins  with 
ct(0)  =  <JC,  forcing  both  source  nodes  away  from  the  cancer  state 
does  nothing  to  the  nodes  downsteam  of  node  3  (see  Fig.  4).  This  is 
because  the  indegree  deg~(4)  =  4,  and  a  majority  of  the  nodes 
connecting  to  node  4  are  in  the  cancer  attractor  state.  At  T  =  0, 
cycle  clusters  with  high  connectivity  tend  to  block  incoming  signals 
front  outside  of  the  cluster,  resulting  in  an  insurmountable 
activation  barrier. 

The  most  effective  single  node  to  control  in  this  network  is  any 
one  of  nodes  4,  6  or  7.  Forcing  any  of  these  away  from  the  cancer 
attractor  state  will  eventually  cause  the  entire  cycle  cluster  to  flip 
away  from  the  cancer  state,  and  all  nodes  downstream  will  flip  as 
well,  as  shown  in  Fig.  4.  The  cycle  cluster  here  acts  as  a  sort  of 
large,  hysteretic  bottleneck.  We  now  generalize  the  concept  of 
bottlenecks. 

Define  a  size  k  bottleneck  in  a  network  G  to  be  a  cycle  cluster  B 
with  \V(B)\  =  k  which,  when  removed  from  G,  reduces  the 
indegree  of  at  least  one  node  jeV(G),  j^V(B)  to  less  than  half  of 
its  original  indegree.  Other  than  now  using  the  set  of  nodes  V(B) 
rather  than  a  single  node  set,  the  above  algorithm  for  finding  the 
bottleneck  control  set  remains  unchanged.  In  Fig.  3,  for  instance, 
FOB)  =  {4,5, 6, 7},  k  =  4,  L(B)  =  {4,5,6,7,8,9,10},  and  1(B)  =  7. 
With  this  more  general  definition,  we  note  that  controlling  any  size 
k  bottleneck  B  guarantees  control  of  all  size  1  bottlenecks  B'  in  the 
control  set  of  B  for  all  k>  1 . 

For  any  bottleneck  B  of  size  k  >  1  in  a  network  G,  define  the  set 
of  critical  nodes,  Z(B,G),  as  the  set  of  nodes  Z(B,G)^  V(B)  of 
minimum  cardinality  that,  when  controlled,  guarantees  full  control 
of  all  nodes  ie  V(B)  after  a  transient  period.  Also  define  the  critical 
number  of  nodes  as  ncrlt(B,G)  =  \Z(B,G)\.  Thus,  for  the  network  in 
Fig.  3,  Z(B,G)  =  { 4},  {6},  or  {7},  and  na-\t(B,G)=  1. 

In  general,  however,  more  than  one  node  in  a  cycle  cluster  may 
need  to  be  targeted  to  control  the  entire  cycle  cluster.  Fig.  5  shows 
a  cycle  cluster  (composed  of  nodes  2-10)  that  cannot  be  controlled 
by  targeting  any  single  node.  The  precise  value  of  77crit  for  a  given 
cycle  cluster  C  depends  on  its  topology  as  well  as  the  edges 
connecting  nodes  from  outside  of  C  to  the  nodes  inside  of  C,  and 
finding  Z(C,G)  can  be  difficult.  We  present  a  theorem  that  puts 
bounds  on  na\ t  to  help  determine  whether  a  search  for  Z(C,G )  is 
practical. 


Figure  2.  A  directed  acyclic  network.  Controlling  all  three  source 
nodes  (nodes  1 ,  2  and  3)  guarantees  full  control  of  the  network,  but  are 
ineffective  when  targeted  individually.  The  best  single  node  to  control 
in  this  network  is  node  6  because  it  directly  controls  all  downstream 
nodes. 

doi:1 0.1 371/journal.pone.01 05842.g002 

Theorem. :  Suppose  a  network  G  contains  a  cycle  cluster  C. 
Define  the  set  of  externally  influenced  nodes 

R(C,G)  =  {ieV(C)  :  jeV(G\C),(j,i)eE(G)} ,  (14) 

the  set  of  intruder  connections 

W(C,G)  =  {(j,i)eE(G) :  ieV(C)JeV(G\C)},  (15) 

and  the  reduced  set  of  critical  nodes 

ZkA(C,G)  =  Z(C,G\W).  (16) 

If  AT  =  |F(C)|  and 

li  =  min  deg~(r),  (17) 

ieF(0 

where  deg  (/)  is  computed  ignoring  intruder  connections,  then 
r^]^«crit(C,G)<{,  (18) 

where 


C  =  min(Jf1  +  |tf(C,G)\Zred(C,G)|,V).  (19) 

Proof:  First,  prove  the  lower  limit  of  Eq.  18.  Let  C  be  a  cycle 
cluster  in  a  network  G  with  R(C,G)  =  {0}.  (A  cycle  cluster  in  a 
network  with  |.R(C,G)|  >0  will  have  the  same  or  higher  activation 
barrier  for  any  node  in  the  cluster  than  the  same  cycle  cluster  in  a 
network  with  R={0}.  Since  we  are  examining  the  lower  limit  of 
Eq.  18,  we  consider  the  case  with  the  lowest  activation  barrier. 
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Figure  3.  A  network  in  which  nodes  4,  5,  6  and  7  compose  a 
single  cycle  cluster.  The  high  connectivity  of  node  4  prevents  any 
changes  made  to  the  spin  of  nodes  1  -3  from  propagating  downstream. 
The  only  way  to  indirectly  control  nodes  8-10  is  to  target  nodes  inside 
of  the  cycle  cluster.  Targeting  node  4,  6  or  7  will  cause  the  entire  cycle 
cluster  to  flip  away  from  its  initial  state,  guaranteeing  control  of  nodes 
4-10  (see  Fig.  4). 

doi:1 0.1 371  /journal.pone.01 05842.g003 

Any  externally  influenced  nodes  cause  Mcr;t  to  either  increase  or 
remain  the  same.)  For  any  node  i  to  be  able  to  flip  away  from  the 
cancer  state  (although  not  necessarily  remain  there),  we  must  have 
that  hj  =  —  a^Cj  for  a  >  0,  meaning  that  at  least  half  of  the  nodes 
upstream  of  i  must  point  away  from  the  cancer  state.  The  node  i 
requiring  the  smallest  number  of  upstream  nodes  to  be  in  the 
normal  state  is  the  node  that  satisfies  deg~(z)  =  /(.  Controlling  less 
than  /(/ 2  nodes  will  leave  all  uncontrolled  nodes  with  a  field  in  the 
cancer  direction,  and  no  more  flips  will  occur.  Thus, 

Merit  >  f  •  (20) 


field  at  all  other  nodes  will  point  away  from  the  cancer  direction. 
The  system  will  then  require  one  more  time  step  to  completely 
settle  to  a ,■=—£?..  Thus,  we  have  that  for  C  =  K /v  in  a  network  G 
with  R(C,G)  =  {0}, 

ncrit(KN,G)  =  [fl  (23) 

Kn  with  (7/(0)  =  gives  the  largest  activation  barrier  for  any  cycle 
cluster  on  N  nodes  with  R(C,G)  =  {0 }  to  switch  away  from  the 
cancer  attractor  state.  A  general  cycle  cluster  C  with  any  topology 
on  N  nodes  with  R(C,G)  =  {0 }  in  a  network  G  will  have 
deg~  (i)<N  for  all  nodes  i,  and  so  we  have  the  upper  bound 

Mcrlt(C,G)<[y],  (24) 

thus  proving  Eq.  18  for  the  special  case  of  R(C,G)=  {0}. 

Now  consider  a  cycle  cluster  C  on  N  nodes  in  a  network  G  with 
|7?(C,G)|>0.  Suppose  all  nodes  in  Zred(C,G)  are  fixed  away 
from  the  cancer  state.  By  Eq.  24,  |Zred(C,G)|  <  [7V/2].  For  any 
node  ie(R(C,G)(~\Zle&(C,G)),  ff,(/->co)  =  —  is  guaranteed 
because  it  has  already  been  directly  controlled.  Any  node 
ie{R(C,G)\Zieii(C,G))  has  some  incoming  connections  from 
nodes  j^V(C),  and  these  connections  could  increase  the  activation 
barrier  enough  such  that  fixing  Zre<j(C,G)  is  not  enough  to 
guarantee  (T,(r->co)=  —  f/.  To  ensure  that  any  node  leV(C) 
points  away  from  the  cancer  state,  it  is  sufficient  to  fix  all  nodes 
/e(_R(C,G)\Zred(C,G))  as  well  as  Zred(C,G)  away  front  the  cancer 
state.  This  increases  Mcrit  by  at  most  |.R(C,G)\Zred(C,G)|,  leaving 

»crit(C,G) <  r^TI  +  l*(C,G)\Zred(C,G)|.  (25) 

«crit  can  never  exceed  N,  however,  because  directly  controlling 
every  node  results  in  controlling  C.  We  can  thus  say  that 

«crit(C,G)<min("[y”|  +|7?(C,G)\Zred(C,G)|,iVy  (26) 


For  the  upper  limit  of  Eq.  18,  consider  a  complete  clique  on  N 
nodes,  C  =  A'y  (that  is,  Ag=  1  for  all  i,jeV(K0,  including  self 
loops)  in  a  network  G.  First,  let  there  be  no  connections  to  any 
nodes  in  C  from  outside  of  C  so  that  R(C,G)  =  {0}.  For  odd  N, 
forcing  (N  + 1  )/2  nodes  away  from  the  cancer  state  will  result  in 
the  field 


Finally,  combining  the  upper  limit  in  Eq.  26  with  the  lower  limit 
front  Eq.  20  gives  Eq.  18.  ■ 

There  can  be  more  than  one  Zred  for  a  given  cycle  cluster.  Note 
that  the  tightest  constraints  on  Mcrjt  in  Eq.  18  come  from  using  the 
Zred  with  the  largest  overlap  with  R.  If  finding  Zred  is  too  difficult, 
an  overestimate  for  the  upper  limit  of  Mcr;t  can  be  made  by 
assuming  that  R0Zie d  =  {0}  so  that 


'N- 1 

2 


iV+1 

2 


(21) 


f fl  <»crit(C,G)<  minjj^n  +I*(C,G)|,  Nj.  (27) 


for  all  nodes  i.  After  one  time  step,  all  nodes  will  flip  away  from  the 
cancer  state.  For  even  N,  forcing  N / 2  nodes  away  from  the  cancer 
state  will  result  in  the  field 


y.  j‘jaj — 


(22) 


for  all  nodes  i.  At  the  next  time  step,  the  unfixed  nodes  will  pick 
randomly  between  the  normal  and  cancer  state.  If  at  least  one  of 
these  nodes  makes  the  transition  away  front  the  cancer  state,  the 


The  cycle  cluster  in  Fig.  5  has  N  =  9,  R  =  {2,9},  (i=  1,  and  one 
of  the  reduced  sets  of  critical  nodes  is  Zred  =  {9,10},  so 
1<  Merit  ^6.  It  can  be  shown  through  an  exhaustive  search  that 
for  this  network  Mcr;t  =  2,  and  the  set  of  critical  nodes  is  Z  =  {9, 1 0} 
(see  Fig.  6).  Here,  Z  =  Zre d,  although  this  is  not  always  the  case. 
Because  the  cycle  cluster  has  9  nodes  and  1  <  Mcr;t  <  6,  at  most 
6  /9  \ 

X]„  =  i(  n  )  =465  simulations  are  needed  to  find  at  least  one 
solution  for  Z(C,G).  However,  the  maximum  number  of 
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Figure  4.  Cancer  magnetization  from  targeting  various  nodes  ii 

averaging  removes  fluctuations  due  to  the  random  flipping  of  nodes  with 
any  one  of  nodes  4,  6  or  7  results  in  the  same  final  magnetization. 
doi:1 0.1 371/journal.pone.01 05842.g004 

simulations  required  to  find  Z(C,G )  increases  exponentially  and 
for  larger  networks  the  problem  quickly  becomes  intractable. 

One  heuristic  strategy  for  controlling  cycle  clusters  is  to  look  for 
size  k'  <|K(C)[  bottlenecks  inside  of  C.  Bottlenecks  of  size  k»  1 
and  average  indegree  <(deg~(i?))>«^  can  contain  high  impact  size 
k'  bottlenecks,  where  k'  <  k.  Size  k  >  1  bottlenecks  need  to  be 
compared  to  find  the  best  set  of  nodes  to  target  to  reduce  mf. 
Simply  comparing  the  impact  is  insufficent  because  a  cycle  cluster 
with  a  large  impact  could  also  have  a  large  nCT\ t,  requiring  much 
more  effort  than  its  impact  merits.  Define  the  critical  efficiency  of  a 
bottleneck  B  as 

(28) 

If  the  critical  efficiency  of  a  cycle  cluster  is  much  smaller  than  the 
impacts  of  size  1  bottlenecks  from  outside  of  the  cycle  cluster,  the 
the  cycle  cluster  can  be  safely  ignored. 

For  some  cycle  clusters,  however,  not  all  of  the  nodes  need  to  be 
controlled  in  order  for  a  large  portion  of  the  nodes  in  the  cycle 
cluster’s  control  set  to  flip.  Define  the  optimal  efficiency  of  a 
bottleneck  B  as 
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ri  the  network  shown  in  Fig.  3,  averaged  over  10,000  runs.  The 

hi  =  0.  Targeting  node  7  results  in  the  quickest  stabilization,  but  targeting 


eopt (B)=  max  (29) 

n=  1,2,...  y  n  J 

where  B,  £2  V(B)  are  size  1  bottlenecks  and  I(Bi)> I(Bi+ 1)  for  all 
i.  Note  that  for  any  size  1  bottleneck  B,  eopi{B)  =  eCIn(B)=I(B). 
This  quantity  thus  allows  bottlenecks  with  very  different  properties 
(1(B),  ncrn(B,G),  or  \  V(B)\)  to  be  ranked  against  each  other. 

All  strategies  presented  above  are  designed  to  select  the  best 
individual  or  small  group  of  nodes  to  target.  Significant  changes  in 
the  biological  networks’  magnetization  require  targeting  many 
nodes,  however.  Brute  force  searches  on  the  effect  of  larger 
combinations  of  nodes  are  typically  impossible  because  the 
required  number  of  simulations  scales  exponentially  with  the 
number  of  nodes.  A  crude  Monte  Carlo  search  is  also  numerically 
expensive,  since  it  is  difficult  to  sample  an  appreciable  portion  of 
the  available  space.  One  alternative  is  to  take  advantage  of  the 
bottlenecks  that  can  be  easily  found,  and  rank  all  size  k  >  1 
bottlenecks  B,  in  an  ordered  list  U  such  that 

U=(BuB2,B3,...)  (30) 

where 
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Figure  5.  A  network  with  a  cycle  cluster  C,  composed  of  nodes 
2-10,  that  cannot  be  controlled  at  7  =  0  by  controlling  any 
single  node.  Here,  the  set  of  externally  influenced  nodes  is 
R(C,G)  =  {2,9},  the  set  of  intruder  connections  is  W(C,G)  =  {(  1,2), 
(1,9)},  the  reduced  set  of  critical  nodes  is  Zred(C,G)  =  { 9,10},  the 
minimum  indegree  is  fi  =  1  and  the  number  of  nodes  in  the  cycle  cluster 
is  IV  =  9.  By  Eq.  1 8,  this  gives  the  bounds  of  the  critical  number  of  nodes 
to  be  1  <  ncrjt  <  6. 

doi:1 0.1 371/journal.pone.01 05842.g005 


eopt{Bi)>eopt(Bi+l),  Bi<fL(Bj)  (31) 

for  all  Bj,BjeU  and  fix  the  bottlenecks  in  the  list  in  order.  This  is 
called  the  efficiency-ranked  strategy.  If  all  size  k  >  1  bottlenecks  are 
ignored,  it  is  called  the  pure  efficiency-ranked  strategy,  and  if  size 


k  >  1  bottlenecks  are  included  it  is  called  the  mixed  efficiency- 
ranked  strategy. 

An  effective  polynomial-time  algorithm  for  finding  the  top  z 
nodes  to  fix,  which  we  call  the  best+1  strategy  (equivalent  to  a 
greedy  algorithm),  works  as  follows:  (1)  Begin  with  a  seed  set  of 
nodes  to  fix,  F\  (2)  Test  the  effect  of  fixing  F\Ji  for  all  allowed 
nodes  i£F\  (3)  F<-F^Ji\,est,  where  /best  is  the  best  node  from  all  i 
sampled;  (4)  If  \F\<z,  go  to  step  (2).  Otherwise,  stop.  The  seed  set 
of  nodes  could  be  the  single  highest  impact  size  1  bottleneck  in  the 
network,  or  it  could  be  the  best  set  of  n  nodes  (where  n  <  z)  found 
from  a  brute  force  search. 

Cancer  Signaling 

In  application  to  biological  systems,  we  assume  that  the 
magnetization  of  cell  type  a  is  related  to  the  viability  of  cell  type 
a,  that  is,  the  fraction  of  cells  of  type  a  that  survives  a  drug 
treatment.  It  is  reasonable  to  assume  that  the  viability  of  cell  type 
a,  va(m‘‘  ),  is  a  monotonically  increasing  function  of  ma  .  Because 
the  exact  relationship  is  not  known,  we  analyze  the  effect  of 
external  perturbations  in  terms  of  the  final  magnetizations. 

We  need  to  use  as  few  controllers  as  possible  to  sufficiently 
reduce  mf  while  leaving  mf  «  + 1 .  In  practical  applications, 
however,  one  is  limited  in  the  set  of  druggable  targets.  All  classes  of 
drugs  are  constrained  to  act  only  on  a  specific  set  of  biological 
components.  For  example,  one  class  of  drugs  that  is  currently 
under  intense  research  is  protein  kinase  inhibitors  [35].  In  this  case 
one  has  two  constraints:  the  only  nodes  that  can  be  targeted  are 
those  that  correspond  to  kinases,  and  they  can  only  be  inhibited, 
i.e.  turned  off.  We  will  use  the  example  of  kinase  inhibitors  to  show 
how  control  is  affected  by  such  types  of  constraints.  In  the  real 
systems  studied,  many  differential  nodes  have  only  similarity  nodes 
upstream  and  downstream  of  them,  while  the  remaining 
differential  nodes  form  one  large  cluster.  This  is  not  important 
for  p  =  1 ,  but  the  effective  edge  deletion  for  p  =  2  results  in  many 
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Figure  6.  Magnetization  for  network  from  Fig.  5,  averaged  over  10,000  runs.  There  is  no  single  node  to  target  that  will  control  the  cycle 
cluster,  but  fixing  nodes  9  and  10  results  in  full  control  of  the  cycle  cluster,  leaving  only  node  1  in  the  cancer  state.  This  means  Z(C,G)  =  {9,10}  and 
Merit  =2. 

doi:1 0.1 371  /journal. pone.01 05842.g006 
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Table  2.  General  properties  of  the  full  networks. 


Properties 

Lung 

B  cell 

Nodes 

9073 

4364 

Edges 

45635 

55144 

Sources 

129 

8 

Sinks 

8443 

1418 

Av.  outdegree 

5.03 

12.64 

Max  outdegree 

240 

2372 

Max  indegree 

68 

196 

Self-loops 

238 

0 

Undirected  edges 

350 

23386 

Diameter 

11 

11 

Max  cycle  cluster 

401 

2886 

Av.  clustering  coeff.  [73] 

0.0544 

0.2315 

The  network  used  for  the  analysis  of  lung  cancer  is  a  generic  one  obtained 
combining  the  data  sets  in  Refs.  [32]  and  [33].  The  B  cell  network  is  a  curated 
version  of  the  B  cell  interactome  obtained  in  Ref.  [34]  using  a  network 
reconstruction  method  and  gene  expression  data  from  B  cells. 
doi:1 0.1 371/journal.pone.0105842.t002 


islets,  which  are  nodes  i  with  Ay  =  Ajj  =  0  for  all  i=£j  (self-loops 
allowed).  Controlling  islets  requires  targeting  each  islet  individu¬ 
ally.  For  p  =  2,  we  concentrate  on  controlling  only  the  largest 
weakly  connected  differential  subnetwork.  All  final  magnetizations 
are  normalized  by  the  total  number  of  nodes  in  the  full  network, 
even  if  the  simulations  are  only  conducted  on  small  portion  of  the 
network. 

The  data  files  for  all  networks  and  attractors  analyzed  below  can 
be  found  in  Supporting  Information. 

Lung  Cell  Network 

The  network  used  to  simulate  lung  cells  was  built  by  combining 
the  kinase  interactome  from  PhosphoPOINT  [32]  with  the 
transcription  factor  interactome  from  TRANSFAC  [33].  Both  of 
these  are  general  networks  that  were  constructed  by  compiling 
many  observed  pairwise  interactions  between  components,  mean¬ 
ing  that  if  j-*i,  at  least  one  of  the  proteins  encoded  by  gene  j  has 
been  directly  observed  interacting  with  gene  i  in  experiments.  This 
bottom-up  approach  means  that  some  edges  may  be  missing,  but 
those  present  are  reliable.  Because  of  this,  the  network  is  sparse 
(~ 0.057%  complete,  see  Table  2),  resulting  in  the  formation  of 
many  islets  for  p  =  2.  Note  also  that  this  network  presents  a  clear 
hierarchical  structure,  characteristic  of  biological  networks 
[36,37],  with  many  “sink”  nodes  [38]  that  are  targets  of 
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Figure  7.  Final  cancer  magnetizations  for  an  unconstrained  search  on  the  lung  cell  network  using  p=  1.  The  efficiency-ranked  strategy 
outperforms  the  relatively  expensive  Monte  Carlo  strategy.  The  best+1  strategy  works  best,  although  it  requires  the  largest  computational  time.  Note 
that  the  mixed  efficiency-ranked  curve  is  not  shown  because  it  is  identical  to  the  pure  efficiency-ranked  curve.  Key  for  magnetization  curves:  MC  = 
Monte  Carlo,  B+1  =  best+1,  ERP=  pure  efficiency-ranked,  ERM=  mixed  efficiency-ranked,  EX=  exhausive  search. 
doi:1 0.1 371/journal.pone.01 05842.g007 
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Figure  8.  Final  cancer  magnetizations  for  an  unconstrained  search  on  the  lung  cell  network  using  p= 2.  As  in  the  p  =  1  case,  the 
efficiency-ranked  strategy  outperforms  the  expensive  Monte  Carlo  search.  The  plateaus  in  the  efficiency-ranked  strategy  when  fixing  9-10,  12-15,  20- 
21,  etc.  nodes  are  a  result  of  targeting  bottlenecks  that  are  already  indirectly  controlled. 
doi:1 0.1 371 /journal. pone.0105842.g008 


transcription  factors  and  a  relatively  large  cycle  cluster  originating 
from  the  kinase  interactome. 

It  is  important  to  note  that  this  is  a  non-specific  network, 
whereas  real  gene  regulatory  networks  can  experience  a  sort  of 
“rewiring”  for  a  single  cell  type  under  various  internal  conditions 
[39].  In  this  analysis,  we  assume  that  the  difference  in  topology 
between  a  normal  and  a  cancer  cell’s  regulatory  network  is 
negligible.  The  methods  described  here  can  be  applied  to  more 
specialized  networks  for  specific  cell  types  and  cancer  types  as 
these  networks  become  more  widely  available. 

In  our  signaling  model,  the  IMR-90  cell  line  [40,41]  was  used 
for  the  normal  attractor  state,  and  the  two  cancer  attractor  states 
examined  were  from  the  A549  (adenocarcinoma)  [42 — 46]  and 
NCI-H358  (bronchioalveolar  carcinoma)  [42,43]  cell  lines.  Gene 
expression  measurements  from  all  referenced  studies  for  a  given 
cell  line  were  averaged  together  to  create  a  single  attractor.  The 
resulting  magnetization  curves  for  A549  and  NCI-H358  are  very 
similar,  so  the  following  analysis  addresses  only  A549.  The  full 
network  contains  9073  nodes,  but  only  1175  of  them  are 
differential  nodes  in  the  IMR-90/A549  model.  In  the  uncon¬ 
strained  p  =  1  case,  all  1175  differential  nodes  are  candidates  for 
targeting.  Exhaustively  searching  for  the  best  pair  of  nodes  to 
control  requires  investigating  689725  combinations  simulated  on 
the  full  network  of  9073  nodes.  However,  1094  of  the  1175  nodes 
are  sinks  (i.e.  nodes  i  with  outdegree  deg+(/)  =  0,  ignoring  self 


loops)  and  therefore  have  I (i)  =  eopl(i)  =  1,  which  can  be  safely 
ignored.  The  search  space  is  thus  reduced  to  8 1  nodes,  and  finding 
even  the  best  triplet  of  nodes  exhaustively  is  possible.  Including 
constraints,  only  3 1  nodes  are  differential  kinases  with  =  + 1 . 
This  reduces  the  search  space  at  the  cost  of  increasing  the 
minimum  achievable 

There  is  one  important  cycle  cluster  in  the  full  network,  and  it  is 
composed  of  401  nodes.  This  cycle  cluster  has  an  impact  of  7948 
for  p=  1,  giving  a  critical  efficiency  of  at  least  ~  19.8,  and 
1  <«crit  <401  by  Eq.  27.  The  optimal  efficiency  for  this  cycle 
cluster  is  e0pt  =  29,  but  this  is  achieved  for  fixing  the  first  bottleneck 
in  the  cluster.  Additionally,  this  node  is  the  highest  impact  size  1 
bottleneck  in  the  full  network,  and  so  the  mixed  efficiency-ranked 
results  are  identical  to  the  pure  efficiency-ranked  results  for  the 
unconstrained  p  =  1  lung  network.  The  mixed  efficiency-ranked 
strategy  was  thus  ignored  in  this  case. 

Fig.  7  shows  the  results  for  the  unconstrained  p  =  1  model  of  the 
IMR-90/A549  lung  cell  network.  (All  simulations  were  performed 
using  MATLAB  on  a  desktop  computer.  Running  the  simulations 
to  make  all  curves  shown  below  required  approximately  12  hours.) 
The  unconstrained  p  =  1  system  has  the  largest  search  space,  so  the 
Monte  Carlo  strategy  performs  poorly.  The  best-1- 1  strategy  is  the 
most  effective  strategy  for  controlling  this  network.  The  seed  set  of 
nodes  used  here  was  simply  the  size  1  botffeneck  with  the  largest 
impact.  Note  that  best-1- 1  works  better  than  effeciency-ranked. 
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Table  3.  Properties  of  the  largest  weakly  connected  differential  subnetworks  for  all  cell  types. 
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This  is  because  best+1  includes  the  synergistic  effects  of  fixing 
multiple  nodes,  while  efficiency-ranked  assumes  that  there  is  no 
overlap  between  the  set  of  nodes  downstream  from  multiple 
bottlenecks.  Importantly,  however,  the  efficiency-ranked  method 
works  nearly  as  well  as  best+1  and  much  better  than  Monte  Carlo, 
both  of  which  are  more  computationally  expensive  than  the 
efficiency-ranked  strategy. 

Fig.  8  shows  the  results  for  the  unconstrained  p  =  2  model  of  the 
IMR-90/A549  lung  cell  network.  The  search  space  for  p  =  2  is 
much  smaller  than  that  for  p  =  1 .  The  largest  weakly  connected 
differential  subnetwork  contains  only  506  nodes  (see  Table  3),  and 
the  remaining  differential  nodes  are  islets  or  are  in  subnetworks 
composed  of  two  nodes  and  are  therefore  unnecessary  to  consider. 
Of  these  506  nodes,  450  are  sinks.  Fig.  9  shows  the  largest  weakly 
connected  component  of  the  differential  subnetwork,  and  the  top 
five  bottlenecks  in  the  unconstrained  case  are  shown  in  red.  If 
limiting  the  search  to  differential  kinases  with  ^  =  + 1  and 
ignoring  all  sinks,  p  =  2  has  19  possible  targets.  There  is  only  one 
cycle  cluster  in  the  largest  differential  subnetwork,  containing  6 
nodes.  Like  the  p  =  1  case,  the  optimal  efficiency  occurs  when 
targeting  the  first  node,  which  is  the  highest  impact  size  1 
bottleneck.  Because  the  mixed  efficiency-ranked  strategy  gives  the 
same  results  as  the  pure  efficiency-ranked  strategy,  only  the  pure 
strategy  was  examined.  The  Monte  Carlo  strategy  fares  better  in 
the  unconstrained  p  =  2  case  because  the  search  space  is  smaller. 
Additionally,  the  efficiency-ranked  strategy  does  worse  against 
the  best+1  strategy  for/>  =  2  than  it  did  for  p=  1.  This  is  because 
the  effective  edge  deletion  decreases  the  average  indegree  of  the 
network  and  makes  nodes  easier  to  control  indirectly.  When  many 
upstream  bottlenecks  are  controlled,  some  of  the  downstream 
bottlenecks  in  the  efficiency-ranked  list  can  be  indirectly 
controlled.  Thus,  controlling  these  nodes  directly  results  in  no 
change  in  the  magnetization.  This  gives  the  plateaus  shown  for 
fixing  nodes  9-10  and  12-15,  for  example. 

The  only  case  in  which  an  exhaustive  search  is  possible  is  for 
p  =  2  with  constraints,  which  is  shown  in  Fig.  10.  Note  that  the 
polynomial-time  best+1  strategy  identifies  the  same  set  of  nodes  as 
the  exponential-time  exhaustive  search.  This  is  not  surprising, 
however,  since  the  constraints  limit  the  available  search  space. 
This  means  that  the  Monte  Carlo  also  does  well.  The  efficiency- 
ranked  method  performs  worst.  The  efficiency-ranked  strategy  is 
designed  to  be  a  heuristic  strategy  that  scales  gently,  however,  and 
is  not  expected  to  work  well  in  such  a  small  space  when  compared 
with  more  computationally  expensive  methods. 

B  Cell  Network 

The  B  cell  network  was  derived  from  the  B  cell  interactome  of 
Ref.  [34],  The  reconstruction  method  used  in  Ref.  [34]  removes 
edges  from  an  initially  complete  network  depending  on  pairwise 
gene  expression  correlation.  Additionally,  the  original  B  cell 
network  contains  many  protein-protein  interactions  (PPIs)  as  well 
as  transcription  factor-gene  interactions  (TFGIs).  TFGIs  have 
definite  directionality:  a  transcription  factor  encoded  by  one  gene 
affects  the  expression  level  of  its  target  gene(s).  PPIs,  however,  do 
not  have  obvious  directionality.  We  first  filtered  these  PPIs  by 
checking  if  the  genes  encoding  these  proteins  interacted  according 
to  the  PhosphoPOINT/TRANSFAC  network  of  the  previous 
section,  and  if  so,  kept  the  edge  as  directed.  If  the  remaining  PPIs 
are  ignored,  the  results  for  the  B  cell  are  similar  to  those  of  the  lung 
cell  network.  We  found  more  interesting  results  when  keeping  the 
remaining  PPIs  as  undirected,  as  is  discussed  below. 

Because  of  the  network  construction  algorithm  and  the  inclusion 
of  many  undirected  edges,  the  B  cell  network  is  more  dense 
(~ 0.290%  complete,  see  Table  2)  than  the  lung  cell  network.  This 
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Figure  9.  Largest  weakly  connected  differential  subnetwork  for  IMR-90/A549  and  p=  2.  Out  of  the  506  pictured  nodes,  450  are  sinks  and 
therefore  have  an  impact  equal  to  one.  The  top  five  bottlenecks  are  labeled  with  their  gene  names  and  colored  orange. 
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higher  density  leads  to  many  more  cycles  than  the  lung  cell 
network,  and  many  of  these  cycles  overlap  to  form  one  very  large 
cycle  cluster  containing  ~66%  of  nodes  in  the  full  network.  All 
gene  expression  data  used  for  B  cell  attractors  was  taken  from  Ref. 
[47] .  We  analyzed  two  types  of  normal  B  cells  (naive  and  memory) 
and  three  types  of  B  cell  cancers  (diffuse  large  B-cell  lymphoma 
(DLBCL),  follicular  lymphoma,  and  EBV-immortalized  lympho¬ 
blastoma),  giving  six  combinations  in  total.  We  present  results  for 
only  the  naive/DLBCL  combination  below,  but  Tables  3  and  4 
list  the  properties  of  all  normal/cancer  combinations.  Again,  all 
gene  expression  measurements  for  a  given  cell  type  were  averaged 
together  to  produce  a  single  attractor.  The  full  B  cell  network  is 
composed  of  4364  nodes.  For  p=\,  there  is  one  cycle  cluster  C 


composed  of  2886  nodes.  This  cycle  cluster  has  l<nCrit(0 
<  1460, 7(C)  =  4353,  and  3.0 <e0rit(O< 4353.  Finding  Z(C )  was 
deemed  too  difficult. 

Fig.  1 1  shows  the  results  for  the  unconstrained  p  =  1  case.  Again, 
the  pure  efficiency-ranked  strategy  gave  the  same  results  as  the 
mixed  efficiency-ranked  strategy,  so  only  the  pure  strategy  was 
analyzed.  As  shown  in  Fig.  11,  the  Monte  Carlo  strategy  is  out¬ 
performed  by  both  the  efficiency-ranked  and  best+1  strategies. 
The  synergistic  effects  of  fixing  multiple  bottlenecks  slowly 
becomes  apparent  as  the  best+1  and  efficiency-ranked  curves 
separate. 

Fig.  12  shows  the  results  for  the  unconstrained  p  =  2  case.  The 
largest  weakly  connected  subnetwork  contains  one  cycle  cluster 
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Figure  10.  Final  cancer  magnetizations  for  a  constrained  search  on  the  lung  cell  network  using  p=  2.  This  is  the  only  case  in  which  a 
limited  exhaustive  search  is  possible.  Interestingly,  the  exhaustive  search  locates  the  same  nodes  as  the  best+1  strategy  for  fixing  up  to  eight  nodes. 
The  efficiency-ranked  strategy  performs  poorly  compared  to  the  Monte  Carlo  strategy  because  the  search  space  is  small  and  a  large  portion  of  the 
available  space  is  sampled  by  the  Monte  Carlo  search. 
doi:1 0.1 371 /journal. pone.01 05842.g01 0 


with  351  nodes,  with  l<«crit<208.  Although  finding  a  set  of 
critical  nodes  is  difficult,  the  optimal  efficiency  for  this  cycle  cluster 
is  62.2  for  fixing  10  bottlenecks  in  the  cycle  cluster.  This  makes 
targeting  the  cycle  cluster  worthwhile.  The  efficiency  of  this  set  of 
10  nodes  is  larger  than  the  efficiencies  of  the  first  10  nodes  from 
the  pure  efficiency-ranked  strategy,  so  the  mf  from  the  mixed 
strategy  drops  earlier  than  the  pure  strategy.  Both  strategies 
quickly  identify  a  small  set  of  nodes  capable  of  controlling  a 
significant  portion  of  the  differential  network,  however,  and  the 
same  result  is  obtained  for  fixing  more  than  10  nodes.  The  best+1 
strategy  finds  a  smaller  set  of  nodes  that  controls  a  similar  fraction 
of  the  cycle  cluster,  and  fixing  more  than  7  nodes  results  in  only 
incremental  decreases  in  mc0 0 .  The  Monte  Carlo  strategy  performs 
poorly,  never  finding  a  set  of  nodes  adequate  to  control  a 
significant  fraction  of  the  nodes  in  the  cycle  cluster. 

Conclusions 

Signaling  models  for  large  and  complex  biological  networks  are 
becoming  important  tools  for  designing  new  therapeutic  methods 
for  complex  diseases  such  as  cancer.  Even  if  our  knowledge  of 
biological  networks  is  incomplete,  rapid  progress  is  currently  being 
made  using  reconstruction  methods  that  use  large  amounts  of 
publicly  available  omic  data  [12,13].  The  Hopfield  model  we  use 
in  our  approach  allows  mapping  of  gene  expression  patterns  of 


normal  and  cancer  cells  into  stored  attractor  states  of  the  signaling 
dynamics  in  directed  networks.  The  role  of  each  node  in  disrupting 
the  network  signaling  can  therefore  be  explicitly  analyzed  to 
identify  isolated  genes  or  sets  of  strongly  connected  genes  that  are 
selective  in  their  action.  We  have  introduced  the  concept  of  size  k 
bottlnecks  to  identify  such  genes.  This  concept  led  to  the 
formulation  of  several  heuristic  strategies,  such  as  the  efficiency- 
ranked  and  best+1  strategy  to  find  nodes  that  reduce  the  overlap  of 
the  cell  network  with  a  cancer  attractor.  Using  this  approach,  we 
have  located  small  sets  of  nodes  in  lung  and  B  cancer  cells  which, 
when  forced  away  from  their  initial  states  with  local  magnetic 
fields  (representing  targeted  drugs),  disrupt  the  signaling  of  the 
cancer  cells  while  leaving  normal  cells  in  their  original  state.  For 
networks  with  few  targetable  nodes,  exhaustive  searches  or  Monte 
Carlo  searches  can  locate  effective  sets  of  nodes.  For  larger 
networks,  however,  these  strategies  become  too  cumbersome  and 
our  heuristic  strategies  represent  a  feasible  alternative.  For  tree-like 
networks,  the  pure  efficiency-ranked  strategy  works  well,  whereas 
the  mixed  efficiency-ranked  strategy  could  be  a  better  choice  for 
networks  with  high-impact  cycle  clusters. 

We  make  two  important  assumptions  in  applying  this  analysis  to 
real  biological  systems.  First,  we  assume  that  genes  are  either  fully 
off  or  fully  on,  with  no  intermediate  state.  Modelling  the  state  of  a 
neuron  as  “all-or-none”  has  long  been  accepted  as  a  reasonable 
assumption  [48] ,  which  provided  the  spin  glass  framework  for  the 
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Table  4.  Best  single  genes  and  their  impacts  for  the  p  =  1  and  p  =  2  models. 
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Figure  11.  Final  cancer  magnetizations  for  an  unconstrained  search  on  the  B  cell  network  using  p  =  1.  The  Monte  Carlo  strategy  is 
ineffective  for  fixing  any  number  of  nodes.  The  efficiency-ranked  and  best+1  curves  slowly  separate  because  synergistic  effects  accumulate  faster  for 
best+1. 

doi:1 0.1 371/journal.pone.01 05842.g01 1 


Hopfield  model.  While  similar  switch-like  behavior  in  gene 
regulatory  networks  has  been  proposed  as  an  explanation  of,  for 
example,  segmentation  in  Drosophila  embryos  [49],  assigning  a 
Boolean  value  to  gene  expression  may  be  overly  simplistic  in  many 
cases.  A  model  which  uses  spins  with  more  than  two  projections 
could  prove  to  be  more  realistic  and  predictive.  Second,  we 
assume  that  all  nodes  update  their  status  with  a  single  timescale 
and  with  a  single  interaction  strength.  If  the  signaling  timescale  Zy 
for  each  edge  in  the  biological  network  is  known,  simulations  could 
be  conducted  in  which  a  signal  traveling  along  an  edge  (j,i) 
reaches  its  target  after  Zy  time  steps.  This  would  amount  to  a 
synchronous  update  schedule  with  a  “queue”  of  signals  moving 
between  nodes.  Likewise,  our  model  gives  equal  weight  to  all  edges 
(aside  from  edges  that  are  effectively  deleted  in  the  p  =  2  case), 
whereas  real  gene  regulatory  networks  exhibit  a  spectrum  of 
interaction  strengths.  This  can  easily  be  integrated  with  our  model 
by  using  a  weighted,  directed  adjacency  matrix.  However,  doing 
this  would  surely  require  a  change  in  control  strategy. 

Despite  these  issues,  our  model  shows  promise.  Some  of  the 
genes  identified  in  Table  4  are  consistent  with  current  clinical  and 
cancer  biology  knowledge.  For  instance,  in  the  lung  cancer  list  we 
found  a  well  known  tumor  suppressor  gene  (TP53)  [50]  that  is 
frequently  mutated  in  many  cancer  types  including  lung  cancer 
[51].  Mutations  in  PBX1  have  recently  been  detected  in  non- 
small-cell  lung  cancer  and  this  gene  is  now  being  considered  as  a 


target  for  therapy  and  prognosis  [52].  MAP3K3  and  MAP3K14 
are  in  the  MAPK/ERK  pathway  which  is  a  target  of  many  novel 
therapeutic  agents  [53]  ,  and  SRC  is  a  well  known  oncogene  and  a 
candidate  target  in  lung  cancer  [54],  BCL6  (B-cell  lymphoma  6)  is 
the  most  common  oncogene  in  DLBCL,  and  it  is  known  that  its 
expression  can  predict  prognosis  and  response  to  drug  therapy 
[55-57].  BCL6  is  also  frequently  mutated  in  follicular  lymphoma 
[58,59].  Our  analysis  identified  BCL6  as  an  important  drug  target 
for  both  DLBCL  and  follicular  lymphoma  using  either  naive  or 
memory  B-cells  as  a  control  for  both  p=  1  and  p  =  2.  RBL2 
deregulation  has  been  recently  associated  with  many  types  of 
lymphoma  [60-62].  FOXM1  is  a  potential  therapeutic  target  in 
mature  B  cell  tumors  [63]  and  ATF2  has  been  recently  found  to  be 
highly  disregulated  in  lymphoma  [64,65] .  Besides  BCL6  discussed 
above,  the  N/D  list  for  DLBCL  contains  genes  (MEF2A  [66], 
NCOA1  [67,68],  TGIF1  [69-71],  NFATC3  [72])  that  are  all 
known  to  have  a  functional  role  in  cancer,  even  if  they  have  not 
been  associated  to  the  specific  B-cell  cancer  types  we  have 
considered.  Our  predictions  are  for  the  immortalized  cell  lines  we 
have  selected,  some  of  which  are  commonly  used  for  in-vitro 
testing  in  many  laboratories.  RNAi  and  targeted  drugs  could  then 
be  used  in  these  cell  lines  against  the  top  scoring  genes  in  Table  4 
to  test  the  disruption  of  survival  or  proliferative  capacity.  If 
experimentally  validated,  our  analysis  based  on  attractor  states  and 
bottlenecks  could  be  applied  to  patient-derived  cancer  cells  by 
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Figure  12.  Final  cancer  magnetizations  for  an  unconstrained  search  on  the  B  cell  network  using  p= 2.  The  rather  sudden  drop  in  the 
magnetization  between  controlling  5  and  10  nodes  in  the  efficiency-ranked  strategies  comes  from  flipping  a  significant  portion  of  a  cycle  cluster.  This 
is  the  only  network  examined  in  which  the  mixed  efficiency-ranked  strategy  produces  results  different  from  the  pure  efficiency-ranked  strategy. 
doi:1 0.1 371 /journal. pone.01 05842.g01 2 


integrating  in  the  model  patient  gene  expression  data  to  identify 
patient-specific  targets. 

The  above  unconstrained  searches  assume  that  there  exists 
some  set  of  “miracle  drugs”  which  can  turn  any  gene  “on”  and 
“off”  at  will.  This  limitation  can  be  patially  taken  into  account  by 
using  constrained  searches  that  limit  the  nodes  that  can  be 
addressed.  However,  even  the  constrained  search  results  are 
unrealistic,  since  most  drugs  directly  target  more  than  one  gene. 
Inhibitors,  for  example,  could  target  differential  nodes  with 
1  and  £”  =  +  1,  which  would  damage  only  normal  cells. 
Additionally,  drugs  would  not  be  restricted  to  target  only 
differential  nodes,  and  certain  combinations  could  be  toxic  to 
both  normal  and  cancer  cells.  Few  cancer  treatments  involve  the 
use  of  a  single  drug,  and  the  synergistic  effects  of  combining 
multiple  drugs  adds  yet  another  level  of  complication  to  finding  an 
effective  treatment  [27],  On  the  other  hand,  the  intrinsic 
nonlinearity  of  a  cellular  signaling  network,  with  its  inherent 
structure  of  attractor  states,  enhances  control  [31]  so  that  a 
properly  selected  set  of  druggable  targets  might  be  sufficient  for 
robust  control. 
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Table  SI  Lung  cell  network.  The  column  labeled  “Source 
EzID”  contains  the  Entrez  IDs  of  transcription  factors  and  kinases, 
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Abstract 

Background:  Many  kinase  inhibitors  have  been  approved  as  cancer  therapies.  Recently,  libraries  of  kinase  inhibitors 
have  been  extensively  profiled,  thus  providing  a  map  of  the  strength  of  action  of  each  compound  on  a  large  number 
of  its  targets.  These  profiled  libraries  define  drug-kinase  networks  that  can  predict  the  effectiveness  of  untested  drugs 
and  elucidate  the  roles  of  specific  kinases  in  different  cellular  systems.  Predictions  of  drug  effectiveness  based  on  a 
comprehensive  network  model  of  cellular  signalling  are  difficult,  due  to  our  partial  knowledge  of  the  complex 
biological  processes  downstream  of  the  targeted  kinases. 

Results:  We  have  developed  the  Kinase  Inhibitors  Elastic  Net  (KIEN)  method,  which  integrates  information  contained  in 
drug-kinase  networks  with  in  vitro  screening.  The  method  uses  the  in  vitro  cell  response  of  single  drugs  and  drug  pair 
combinations  as  a  training  set  to  build  linear  and  nonlinear  regression  models.  Besides  predicting  the  effectiveness  of 
untested  drugs,  the  KIEN  method  identifies  sets  of  kinases  that  are  statistically  associated  to  drug  sensitivity  in  a  given 
cell  line.  We  compared  different  versions  of  the  method,  which  is  based  on  a  regression  technique  known  as  elastic  net. 
Data  from  two-drug  combinations  led  to  predictive  models,  and  we  found  that  predictivity  can  be  improved  by 
applying  logarithmic  transformation  to  the  data.  The  method  was  applied  to  the  A549  lung  cancer  cell  line,  and 
we  identified  specific  kinases  known  to  have  an  important  role  in  this  type  of  cancer  (TGFBR2,  EGFR,  PHKG1  and 
CDK4).  A  pathway  enrichment  analysis  of  the  set  of  kinases  identified  by  the  method  showed  that  axon  guidance, 
activation  of  Rac,  and  semaphorin  interactions  pathways  are  associated  to  a  selective  response  to  therapeutic 
intervention  in  this  cell  line. 

Conclusions:  We  have  proposed  an  integrated  experimental  and  computational  methodology,  called  KIEN,  that 
identifies  the  role  of  specific  kinases  in  the  drug  response  of  a  given  cell  line.  The  method  will  facilitate  the  design 
of  new  kinase  inhibitors  and  the  development  of  therapeutic  interventions  with  combinations  of  many  inhibitors. 

Keywords:  Drug  response  predictions,  Kinase  inhibitors,  Elastic  net  regression,  High  throughput  screening,  Drug 
combination  therapies 


Background 

The  important  role  of  kinases  in  cancer  biology  [1]  has 
spurred  a  considerable  effort  towards  the  synthesis  of 
libraries  of  fully  profiled  kinase  inhibitors,  providing  a 
map  of  the  strength  of  each  compound  on  a  large  num¬ 
ber  of  its  potential  targets  [2-4].  In  particular,  a  recently 
published  dataset  has  profiled  several  hundred  kinase 
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inhibitors  using  a  panel  of  more  than  300  kinases  [4]. 
These  profiled  libraries  define  a  network  of  interactions 
between  drugs  and  their  kinase  targets  [5],  and  represent 
a  valuable  resource  for  the  development  of  new  therap¬ 
ies.  In  this  paper,  we  introduce  a  novel  computational 
method  that  incorporates  profiled  libraries  and  in  vitro 
measurements  to  predict  the  response  of  cells  to  previ¬ 
ously  untested  drugs.  Besides  making  prediction  about 
the  cellular  response  to  drugs,  the  method  identifies 
critical  kinase  targets  and  pathways  that  are  statistically 
associated  to  drug  sensitivity  in  a  given  cell  line. 
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Statistical  inference  and  regression  methods  in  con¬ 
junction  with  gene  expression  or  mutations  have  been 
used  to  identify  specific  biomarkers  associated  with  an 
increased  sensitivity/resistance  to  drugs.  For  instance, 
the  sensitivity  to  PARP  inhibitors  of  Ewing’s  sarcoma 
cells  with  mutations  in  the  EWS  gene  and  to  MEK 
inhibitors  in  NR  AS -mutant  cell  lines  with  AHR  expres¬ 
sion  have  been  predicted  using  analysis  of  variance  and 
the  elastic  net  method  [6]  and  then  experimentally 
validated  [7,8].  In  these  analyses,  the  statistical  variable 
associated  to  drugs  was  represented  by  the  half  maximal 
inhibitory  concentration  (IC50)  in  different  cell  lines. 
Elowever,  besides  the  IC50,  there  are  many  other  types  of 
information  that  characterize  chemical  compounds. 
These  types  of  information  can  enhance  the  statistical 
analyses  and  improve  the  accuracy  of  predictions.  For 
instance,  a  method  to  predict  drugs  sensitivity  in  cell 
lines  based  on  the  integration  of  genomic  data  with 
molecular  physico-chemical  descriptors  of  the  drugs  has 
been  recently  proposed  [9].  Another  useful  type  of  infor¬ 
mation  is  the  residual  activity  of  kinases  after  interacting 
with  a  compound.  Kinase  profiling,  patient  genetic  pro¬ 
files,  and  sensitivity  of  primary  leukemia  patient  samples 
to  kinase  inhibitors  were  recently  used  by  Tyner  et  al. 
[10]  to  identify  functionally  important  kinase  targets  and 
clarify  kinase  pathway  dependence  in  cancer. 

In  this  paper,  the  residual  activity  of  kinases  upon  drug 
interaction  is  used  to  make  predictions  of  the  cellular 
response  for  in  vitro  experiments  using  an  elastic  net  [6] 
regression  approach.  This  regression  method  reduces 
the  number  of  predictors  to  a  minimum  set,  providing  a 
clear  picture  of  the  kinases  involved  in  the  response  of 
cell  lines.  A  primary  screen  (single  drug)  and  a  second¬ 
ary  screen  (two-drug  combinations)  are  used  as  the 
training  set  for  the  regression.  The  two-drug  screening 
exhibits  a  broader  distribution  in  the  response  and  pro¬ 
vides  a  good  level  of  predictability.  In  fact,  the  model 
based  only  on  single  drug  response  did  not  pass  the  stat¬ 
istical  cross-validation  test. 

We  are  applying  this  Kinase  Inhibitor  Elastic  Net 
(KIEN)  method  to  predict  cell  viability  of  a  lung  cancer 
cell  line  (A549)  and  a  normal  fibroblast  cell  line  (IMR-90) 
after  drug  treatment.  We  found  that  the  regression  can  be 
improved  through  a  logarithmic  transformation  on  the 
data.  Using  the  results  of  the  regression,  we  identified  a 
set  of  kinases  that  are  strongly  associated  to  a  selective  re¬ 
sponse  of  A549  and  not  IMR-90.  Then,  a  pathway-based 
enrichment  using  Reactome  [11]  revealed  ten  significant 
pathways  using  this  set  of  kinases,  including  axonal  guid¬ 
ance  and  related  semaphorin  interactions  as  top  hits. 

This  paper  is  organized  as  follows:  Section  In  vitro  screen 
of  the  kinase  inhibitor  library  contains  the  experimental 
results  of  the  primary  and  secondary  in  vitro  screening  cor¬ 
responding  to  single  drugs  and  two-drug  combinations. 


These  experimental  results  and  residual  kinase  activity  are 
analyzed  with  Pearson’s  correlation  in  Section  Analysis  of 
correlations.  This  simple  correlation  analysis  gives  a  first 
glance  of  the  kinases  that  are  statistically  associated  to  a  sig¬ 
nificant  change  in  the  viability  of  cancer  and  normal  cell 
lines.  In  Section  Elastic  net  regression,  we  introduce  the 
elastic  net  approach  and  we  present  the  results  of  a  leave- 
one-out  cross  validation  for  predictions  on  single  and  pairs 
of  drugs.  We  also  present  in  this  section  the  results  ob¬ 
tained  using  the  logarithmic  transformation  on  the  vari¬ 
ables  and  a  pathway  enrichment  analysis  using  Reactome 
[11].  The  Discussion  of  the  results  is  in  Section  Discussion, 
conclusions  in  Section  Conclusions,  and  Materials  and 
Methods  in  Section  Materials  and  methods. 

Results 

In  vitro  screen  of  the  kinase  inhibitor  library 

Our  methodology  begins  with  the  high-throughput 
screening  of  single  drug  and  drug  pair  experiments.  The 
244  kinase  inhibitors  (KIs)  of  the  EMD  drug  library  were 
screened  at  1000  nM  individually  and  the  treatment 
lasted  for  72  hours.  To  quantify  a  selective  response  of  a 
cancer  cell  line  with  respect  to  a  control  normal  cell  line, 
we  define  the  selectivity  S  of  a  single  drug  or  drug  com¬ 
bination  as 

w 

where  vN  indicates  the  viability  of  normal  cells  (IMR90) 
after  treatment,  and  vc  the  viability  of  cancer  cells 
(A549)  after  treatment.  From  the  screening  of  the  244 
KIs,  the  top  hit  was  PDK1/Aktl/Flt3  Dual  Pathway 
Inhibitor  (CAS  #  331253-86-2)  as  ranked  by  selectivity 
(Figure  1).  For  the  secondary  screen,  we  used  the  PDK1/ 
Aktl/Flt3  Dual  Pathway  Inhibitor  as  the  starting  point 
and  combined  this  compound  with  the  other  KIs  as  a 
drug  pair  combination.  The  dose  of  PDK1/Aktl/Flt3 
Dual  Pathway  Inhibitor  was  studied  to  ensure  proper 
dosing  range  and  minimize  toxicity.  We  used  125  nM, 
which  maintains  the  normal  cell  line  IMR-90’s  viability 
>90%  (Figure  2).  For  the  other  243  KIs  we  used  the 
standard  dose  of  1000  nM.  Several  pairs  in  the  second¬ 
ary  screen  showed  very  high  selectivity.  The  top  hit  from 
the  secondary  screen  of  the  library  was  Alsterpaullone 
2-cyanoethyl  (CAS  #  852529-97-0)  with  a  selectivity  of 
S  =  6.14  for  the  pair  (Figure  3). 

Analysis  of  correlations 

In  our  second  step,  we  analyzed  the  Pearson’s  correlation 
of  the  primary  and  secondary  screening  with  a  published 
dataset  [4]  containing  target  profiles  for  140  kinase 
inhibitors.  Therefore,  even  though  we  had  a  library  of 
244  KIs  in  the  experimental  screening,  we  were  limited 
to  utilizing  140  KIs  for  the  analysis.  For  each  inhibitor, 
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Kinase  Inhibitor 

Figure  1  Primary  screen  results  of  the  top  ten  most  selective  kinase  inhibitors.  Drugs  are  ranked  based  on  the  IMR-90  to  A549  viability  ratio. 
The  3  digit  codes  identify  the  compounds:  A15:  PDK1/Akt1/Flt3  Dual  Pathway  Inhibitor  (CAS  331253-86-2);  E20:  Cdk/Crk  Inhibitor  (CAS  78421 1-09-2); 
020:  SU9516  (CAS  666837-93-0);  H15:  MEK1/2  Inhibitor  II  (CAS  212631-61-3);  L13:  PI  3-Ka  Inhibitor  VIII  (CAS  372196-77-5);  G10:  Fascaplysin, 
Synthetic  (CAS  1 14719-57-2);  D07:  Cdk2  Inhibitor  II  (CAS  222035-13-4);  C16:  Cdkl/2  Inhibitor  III  (CAS  443798-55-8);  M16:  GSK3b  Inhibitor  XII, 
TWS1 19  (CAS  601514-19-6);  N05:  Reversine  (CAS  656820-32-5).  The  chemical  structure  of  these  compounds  is  given  in  a  Additional  file  2. 


the  dataset  provides  the  residual  activity  (0<A<1)  of 
291  kinases  after  drug  treatment.  This  quantity  is  a 
measure  of  the  strength  of  inhibition  of  a  drug  on  each 
kinase. 

For  each  kinase  k,  we  calculate  the  Pearson’s  correl¬ 
ation,  Ck,  between  the  selectivity  .S',  and  the  activities 
Akt  h  with  i  e  {1,  indicating  the  single  drug  or  drug 

pair  in  the  set.  For  drug  pairs,  the  activity  is  estimated  as 
a  product  of  the  residual  activities  of  the  two  drugs.  The 
kinases  are  then  ranked  based  on  the  /7-value  of  their 
correlation  with  selectivity,  and  we  calculate  the  False 
Discovery  Rate  (FDR)  adjusted  p  value  [12].  The  list  of 
kinases  mostly  correlated  to  the  selectivity  from  the 
primary  and  secondary  screen  are  listed  in  Table  1.  We 
also  did  calculations  of  the  correlation  between  the  normal 
or  cancer  cell  viability  and  the  activities.  The  results 
for  the  top  kinase-viability  correlations  for  the  primary 


and  secondary  screen  are  shown  in  the  supplementary 
materials  (Additional  file  1:  Table  SI). 

Elastic  net  regression 

Next,  we  build  a  regression  model  that  predicts  the 
response  of  a  cell  line  to  a  drug  or  drug  combination 
i.  The  response  we  predict  is  the  normal  and  cancer  cell 
viability,  from  which  the  selectivity  can  be  derived.  For 
this  purpose,  we  define  a  regression  problem  in  which 
we  use  the  residual  activity  of  the  kinase  k  under  the 
effect  of  drug  i,  which  we  indicate  as  Aki  „  as  predictors 
of  the  viability.  The  response  can  be  written  as 

Vi  =  Pa  +  ftAi,  ;  +  •••+  P,Ap,  i  ■  (1) 

A  fitting  procedure  based  on  a  training  set  of  measure¬ 
ments  produces  the  coefficients  (/So. /Si,  Equation 


PDK1/Aktl/Flt3  Dual  Pathway  Inhibitor  Dose  Response 


■IMR-90 

■A549 


Figure  2  Dose  response  curve  of  PDK1/Akt1/Fit3  dual  pathway  inhibitor.  Different  doses  of  PDK1/Akt1/Flt3  Dual  Pathway  Inhibitor  were 
tested  to  measure  the  response  of  A549  to  the  drug.  For  the  secondary  screen  we  selected  125nM  to  ensure  low  toxicity  on  the  normal  cell  line. 

s _ J 
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Drug  Pairs 

Figure  3  Secondary  screen  results  of  the  top  ten  most  selective  drugs  (1000  nM)  when  paired  with  PDK1/Akt1/Flt3  dual  pathway 
inhibitor  at  125  nM.  Selectivity  is  the  IMR-90  to  A549  viability  ratio,  as  defined  in  Section  2.1.  The  3  digit  codes  identify  the  compounds:  A12: 
Alsterpaullone,  2-Cyanoethyl  (CAS  852529-97-0);  D17:  Cdk2/9  Inhibitor  (CAS  507487-89-0);  K08:  K-252a,  Nocardiopsis  sp.  (CAS  97161-97-2);  021: 
Staurosporine,  Streptomyces  sp.  (CAS  62996-74-1);  P15:  WHI-P180,  Hydrochloride  (CAS  211555-08-7);  E13:  Go  6976  (CAS  136194-77-9);  C09: 
Compound  56  (CAS  171745-13-4);  A10:  Alsterpaullone  (CAS  237430-03-4);  003:  AG  1478,  Selective  inhibitor  of  epidermal  growth  factor  receptor 
(EGFR)  protein  (CAS  175178-82-2);  N05:  Reversine  (CAS  656820-32-5).  The  chemical  structure  of  these  compounds  is  given  in  a  Additional  file  2. 


(1)  can  then  be  used  to  predict  the  viability  of  a  new 
drug  that  has  not  been  tested,  but  of  which  the  profiling 
information  is  available.  Note  that  we  are  integrating 
two  different  types  of  data:  kinase  profiling  data  is  ob¬ 
tained  through  enzymatic  assays  that  probe  directly  the 
interaction  between  drug  and  kinases,  while  the  in  vitro 
cell  response  data  is  the  result  of  complex  signaling  that 
involves  many  pathways  downstream  of  the  affected 
kinases.  The  coefficients  jik  can  be  seen  as  a  measure  of 
the  sensitivity  of  a  given  cell  line  due  to  alterations  in 
the  activity  of  kinase  k. 

It  is  well  known  that  the  least  square  method  does  not 
perform  well  in  the  case  of  linear  regression  with  many 
predictors.  In  our  case,  we  would  like  to  use  a  database 
of  drugs  that  have  been  profiled  on  about  300  kinases. 


However,  it  would  be  desirable  to  select  and  keep  in  the 
final  model  a  minimal  set  of  the  kinases  that  provide  a 
simple  model,  useful  to  gain  biological  insight.  The  lasso 
technique  [13]  is  a  powerful  method  to  reduce  the  num¬ 
ber  of  predictors  by  imposing  a  penalty  on  the  regres¬ 
sion  coefficients.  However,  in  the  presence  of  a  group  of 
kinase  predictors  with  strong  mutual  correlation,  the 
lasso  could  select  only  one  kinase  predictor  from  the 
group  while  missing  the  others.  To  prevent  this  problem, 
our  method  uses  the  elastic  net  approach.  This  method 
incorporates  the  lasso  penalty  as  well  as  a  ridge  penalty 
to  keep  the  regression  coefficients  small  without  com¬ 
pletely  removing  them  [6].  The  weights  of  the  ridge  and 
lasso  penalties  in  the  least  square  procedure  can  be  opti¬ 
mized  for  best  performance  of  the  method. 


Table  1  Correlations  between  selectivity  and  kinase  activity  from  primary  and  secondary  screening 


Kinase 

Primary  screening 

Selectivity  corr 

FDR 

Kinase 

Secondary  screening 

Selectivity  corr 

FDR 

PRKCZ 

0.451 

2.28E-08 

TGFBR2 

-0.501 

8.29E-08 

DMPK 

0.435 

7.75E-08 

CDK4 

-0.412 

6.40E-05 

STK39 

0.430 

1.15E-07 

CDC42BPB 

-0.409 

6.40E-05 

EPHA8 

0.420 

2.33E-07 

RIPK2 

-0.399 

7.73E-05 

ADRBK2 

0.399 

1.0  IE-06 

DSTYK 

-0.369 

0.000413 

PRKACG 

0.396 

1 .27E-06 

ACVRL1 

-0.368 

0.000413 

CAMK4 

0.394 

1 .45E-06 

PAK1 

-0.367 

0.000413 

MAP2K2 

0.393 

1 .53E-06 

MAPKAPK2 

-0.364 

0.00041 3 

ADRBK1 

0.392 

1 .62E-06 

PAK7 

-0.359 

0.000424 

PNCK 

0.382 

3.29E-06 

CDK1 

-0.357 

0.000429 

A  negative  correlation  indicates  that  inhibition  of  that  particular  kinases  is  associated  to  a  higher  selectivity.  The  top  two  hits  with  negative  correlation,  TGFBR2 
and  CDK4  are  known  to  have  an  important  role  in  cell  proliferation,  invasion  and  metastasis  in  lung  adenocarcinoma  [21,22]. 
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We  show  in  Figure  4(a)  and  (b)  the  results  of  a  leave 
one  out  cross  validation  (LOOCV)  method  for  the 
primary  (a)  and  secondary  screen  (b).  For  each  of  the 
140  drugs,  we  apply  the  elastic  net  method  using  the 
remaining  139  drugs  and  then  we  compare  the  result 
to  the  measured  value.  This  cross  validation  method  is 
a  particular  case  of  the  more  general  £-fold  cross  valid¬ 
ation  procedure  in  which  k  is  equal  to  the  size  of  the 
training  set  [14].  The  cross  LOOCV  shows  that  the 
information  contained  in  the  primary  screen  is  not 
sufficient  to  define  a  predictive  model.  The  fact  that 
some  kinases  in  Table  1  show  some  significant  correl¬ 
ation  with  the  response  when  considered  individually 
is  in  general  not  a  sufficient  condition  for  defining  a 
predictive,  multiple  regression  model.  On  the  other 
hand,  the  secondary  screen  is  able  to  reproduce  the 
viability  of  many  drugs,  especially  the  ones  with  the 
stronger  effect  on  both  cell  lines.  Overall,  the  data 
from  the  secondary  screen  presents  a  much  broader 
distribution  with  a  tail  representing  a  few  drug  combi¬ 
nations  particularly  effective.  The  regression  works  better 
in  identifying  these  highly  effective  pairwise  combinations 
and  the  relative  ranking  of  their  strengths.  Data  is  not 
particularly  informative  for  drugs  and  drug  pair  combi¬ 
nations  that  are  not  effective,  which  concentrate  in  the 
neighborhood  of  ~  1. 

Data  transformations  can  represent  a  powerful  strategy 
to  improve  regression.  We  applied  a  logarithmic  trans¬ 
formation,  which  is  consistent  with  the  hypothesis  of  an 
independent  action  on  the  different  kinases  on  the  total 


viability.  In  this  case  we  assume  that  the  viability  can  be 
rewritten  in  the  form 

Vi  =  eA  (2) 

By  applying  a  log  transformation  on  both  sides  of 
Eq.  (2)  we  reduce  the  problem  to  a  linear  regression,  to 
which  the  elastic  net  strategy  can  be  applied.  We  show 
in  Figure  5  the  results  of  the  LOOCV  for  the  primary 
and  secondary  screen  using  the  logarithmic  data  trans¬ 
formation.  As  in  the  linear  case,  we  find  that  the 
method  fails  the  cross  validation  procedure  if  we  use 
data  from  the  primary  screen,  while  the  secondary 
screen  with  log  transformed  data  gives  better  R2. 

In  addition  to  a  regression  model  that  can  be  used  to 
predict  the  efficacy  of  drugs  that  have  not  been  tested, 
the  Pi  coefficients  can  be  used  to  rank  kinases  in  terms 
of  their  relevance  in  the  regression.  Therefore,  these 
coefficients  identify  the  kinases  whose  inhibition  is  asso¬ 
ciated  to  a  decrease  in  the  cell  viability.  A  ranking  based 
on  the  differential  ,  where  the  index  N  and  C 

identify  the  regression  model  of  the  cancer  and  normal 
cells,  gives  insight  on  specific  pathways  important  for  a 
selective  response  of  cancer  cells.  Table  2  shows  a  list  of 
kinases  ranked  in  terms  of  \pf -ft?  | ,  where  the  coeffi¬ 
cients  have  been  obtained  using  the  logarithmic  data 
transformation  on  the  secondary  screen. 

In  order  to  test  whether  selected  pathways  were 
significantly  enriched  for  the  identified  kinase  genes 
in  Table  2,  a  pathway-based  enrichment  analysis  was 
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Figure  4  Leave-one-out  Cross  Validation  of  the  elastic  net  regression  model  based  on  the  primary  (top)  and  secondary  (bottom) 
screens  for  normal  and  cancer  cell  lines.  Each  of  the  140  point  in  these  figures  corresponds  to  one  of  the  140  drug.  "Regression"  refers  to  the 
viability  predicted  by  the  regression  model  using  all  data  from  the  other  139  drugs  as  training  set,  while  "Measured"  refers  to  the  actual  viability 
measured  for  the  drug  or  drug  combination.  Note  that  only  the  secondary  screen  leads  to  predictive  models  with  significant  R2  for  the  two 
cancer  cell  types. 
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Figure  5  Leave-one-out  Cross  Validation  of  the  elastic  net  regression  model  based  on  the  primary  (top)  and  secondary  (bottom) 
screens  for  normal  and  cancer  cell  lines  after  logarithmic  transformation  on  the  data.  Each  of  the  140  point  in  these  figures  corresponds 
to  one  of  the  140  drugs.  "Regression"  refers  to  - log  of  the  viability  predicted  by  the  regression  model  using  all  data  from  the  other  139  drugs  as 
training  set,  while  "Measured”  refers  to  -log  of  the  actual  viability  measured  for  the  drug  or  drug  combination.  Note  that,  as  in  Figure  4,  only  the 
secondary  screen  leads  to  predictive  models  with  significant  R2  for  both  cell  types.  The  R2  for  the  Cancer  cell  lines  is  considerably  better  using  the 
log  transformation. 


conducted  using  the  results  from  the  elastic  net  kinase 
analysis  and  Fisher  exact  tests.  Ten  pathways  from 
Reactome  were  identified  as  significant  (p  <  0.05)  using 
this  kinase  list,  including  axon  guidance,  activation  of 
Rac,  and  semaphorin  interactions  as  top  hits  (Table  3). 

Discussion 

Drug-kinase  profiling  represents  a  controller-target 
network  [5]  that  when  combined  with  in  vitro  testing, 
can  be  used  in  regression  models  to  predict  drug 
response  and  to  identify  pathways  statistically  associ¬ 
ated  to  drug  sensitivity.  Network  methods  in  biology 
are  often  based  on  the  analysis  of  large  datasets  from 
high-throughput  experiments.  An  example  is  given  by 
gene  regulatory  networks,  which  presents  many  chal¬ 
lenges  either  when  restricted  to  a  homogeneous  set  of 
data  [15,16]  or  when  it  includes  different  classes  of 
data  [17-20].  In  our  KIEN  method,  information  from 
the  drug-target  network  and  experimental  query  of  the 
biological  system  are  integrated.  The  goal  is  not  a  recon¬ 
struction  of  a  regulatory  network,  but  to  identify  a  set  of 
kinases  linked  to  a  therapeutic  response  in  a  given  cell 
line.  In  order  to  establish  associations,  the  system  has  to 
be  perturbed  by  the  use  of  kinase  inhibitor  drugs.  The 
response  to  these  single  drugs  or  drug  combinations 
becomes  a  training  set  that  when  combined  with  the 
kinase  profiling,  can  lead  to  predictions. 

The  elastic  net  method  is  one  of  the  most  widely  used 
regularization  techniques.  Regularization  techniques 
are  used  in  statistical  and  machine  learning  models 
to  achieve  an  optimal  tradeoff  between  accuracy  and 


simplicity.  Simplicity  makes  a  model  less  prone  to 
overfitting  and  more  likely  to  generalize.  In  our  ana¬ 
lysis,  we  found  that  the  elastic  net  regressions  based  on 
single  drug  responses  were  not  successful,  while  drug 
pair  data  provided  statistically  significant  predictions. 
A  possible  explanation  for  this  finding  is  the  following: 
single  drugs  might  be  less  able  to  overcome  the  robust¬ 
ness  of  biological  networks  [5].  The  phenotypic  signal 
is  therefore  blunted  and  not  easily  measured.  If  a 
second  drug  is  added,  any  compensatory  capacity  is 
already  stretched  and  the  effects  from  the  inhibition  of 
each  kinase  can  be  seen  more  clearly.  Using  data  from 
drug  pairs,  we  found  that  noise  can  be  better  filtered  out 
and  stronger  statistical  associations  between  kinases 
and  therapeutic  response  are  revealed.  Clearly,  if  a 
different  training  set  with  higher  variance  in  efficacy 
measures  were  used  in  the  primary  screen,  it  is  likely 
that  also  single  drug  in  vitro  response  would  have 
given  a  significant  predictive  model. 

We  identified  several  kinases  that  are  implicated  in 
lung  cancer  that  gives  biological  significance  to  our 
KIEN  method.  In  particular,  TGFBR2  appears  as  a  top 
hit  both  in  the  correlation  and  in  the  elastic  net 
methods.  This  finding  is  consistent  with  recent  siRNA 
experiments  on  A549  cell  lines  [21],  which  demonstrated 
that  silencing  of  this  receptor  reduces  cell  proliferation, 
invasion,  and  metastasis.  The  Cyclin-dependent  kinase 
4  (CDI<4)  appears  as  a  second  top  target  in  the  correl¬ 
ation  analysis,  and  is  also  highly  significant  in  the  KIEN 
analysis.  Experiments  using  lentiviral-mediated  shRNA 
to  inhibit  CDI<4  in  A549  have  shown  inhibited  cell  cycle 
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Table  2  Kinases  with  the  highest  difference  in  the 
regression  coefficients  for  the  log  transformed  data  of 
the  secondary  screen 


Kinase 

Cancer  beta 
coefficient 

Normal  beta 
coefficient 

Difference 

TGFBR2 

0.061 

0.000 

0.061 

EGFR 

0.060 

0.000 

0.060 

PHKG1 

0.051 

0.014 

0.037 

RIPK2 

0.032 

-0.002 

0.034 

PRKG2 

0.012 

0.045 

0.033 

CDK4 

0.021 

-0.008 

0.029 

MAP3K10 

0.038 

0.014 

0.024 

MARK4 

0.000 

0.022 

0.022 

PAK1 

0.025 

0.004 

0.021 

MAP4K5 

0.021 

0.000 

0.021 

MARK2 

0.006 

0.026 

0.021 

MARK3 

0.000 

0.020 

0.020 

TBK1 

0.012 

0.031 

0.020 

ERBB2 

0.021 

0.001 

0.019 

NUAK1 

-0.029 

-0.010 

0.019 

ULK2 

0.018 

0.000 

0.018 

MYLK2 

-0.024 

-0.006 

0.018 

MAP4K4 

0.004 

-0.014 

0.018 

CDK5 

0.002 

-0.016 

0.018 

GSK3B 

0.021 

0.004 

0.017 

PAK2 

0.019 

0.002 

0.017 

CDC42BPB 

0.023 

0.006 

0.017 

DSTYK 

0.006 

-0.010 

0.016 

RPS6KA2 

0.000 

-0.016 

0.016 

FGFR1 

-0.004 

0.012 

0.016 

PAK7 

0.015 

0.000 

0.015 

PIM1 

-0.015 

0.000 

0.015 

CDK3 

0.015 

0.000 

0.015 

IRAKI 

-0.002 

-0.017 

0.015 

A  larger  difference  is  associated  with  a  selective  response  of  A549  upon 
inhibition.  Note  that  in  addition  to  TGFB2R  and  CDK4,  which  were  identified 
with  the  correlation  approach  of  Table  1,  additional  kinases  known  to  have  an 
important  role  in  lung  cancer  such  as  EGFR  [24,25]  and  PHKG1  [26]  are  found 
using  the  elastic  net  approach. 

progression,  suppressed  cell  proliferation,  colony  for¬ 
mation,  and  migration  [22],  and  there  is  an  ongoing 
clinical  trial  using  a  CDK4/6  inhibitor  in  lung  cancer 
[23].  The  KIEN  analysis  identified  EGFR,  which  is 
known  to  be  overexpressed  in  the  majority  of  non¬ 
small  cell  lung  cancers  [24].  Furthermore,  RNAi  experi¬ 
ments  targeting  EGFR  demonstrated  cancer  growth 
suppression  in  A549  xenograft  in  mice  [25].  The  third 
kinase  in  Table  2,  PHKG1  has  also  been  found  to  be 
upregulated  in  human  tumor  samples,  including  lung 


Table  3  Reactome  pathways  with  significant 
representation  of  kinases  from  the  regression  analysis 


Path  ID 

Path  name 

Ns 

Nt 

p-val 

422475 

Axon  guidance 

9 

31 

0.005 

428540 

Activation  of  Rac 

3 

5 

0.008 

373755 

Semaphorin  interactions 

4 

10 

0.011 

376176 

Signaling  by  Robo  receptor 

3 

7 

0.024 

1 266738 

Developmental  Biology 

8 

39 

0.026 

445144 

Signal  transduction  by  LI 

4 

13 

0.030 

373760 

LI  CAM  interactions 

4 

14 

0.040 

1 93639 

p75NTR  signals  via  NF-kB 

2 

4 

0.051 

209543 

p75NTR  recruits  signaling  complexes 

2 

4 

0.051 

389359 

CD28  dependent  Vavl  pathway 

2 

4 

0.051 

Ns  indicates  the  number  of  kinases  that  are  found  significant  in  the  regression 
analysis,  while  NT  is  the  total  number  of  kinases  in  the  pathway.  The  top  ten 
pathways  with  Fisher  exact  test  p  <  =0.051  are  shown.  These  pathways  are 
identified  from  518.  Reactome  pathways  containing  at  least  one  of  the  kinases 
identified  in  Table  2.  The  9  kinases  in  the  axon-guidance  pathway  are  EGFR, 
PAK1,  ERBB2,  CDK5,  GSK3B,  PAK2,  RPS6KA2,  FGFR1  and  PAK7. 

adenocarcinoma,  and  aberrations  in  its  gene  copy  num¬ 
ber  is  a  feature  of  many  human  tumors  [26]. 

The  pathway-based  enrichment  provides  a  broader 
view  on  the  role  of  the  kinases  identified  by  our  method 
in  Table  2.  Among  the  top  three  pathways  shown  in 
Table  3  are  activation  of  Rac  and  Semaphorin  interac¬ 
tions.  Rac  proteins  play  a  key  role  in  cancer  signaling 
and  they  belong  to  the  RAS  superfamily  [27].  We  also 
identified  a  set  of  semaphorins  in  our  analysis  that  is 
represented  in  the  top  significantly  enriched  pathways. 
Semaphorins,  previously  known  as  collapsins,  are  a  set 
of  proteins  containing  a  500-amino  acid  sema  domain 
among  others  (including  PSI  and  immunoglobulin  type 
domains),  which  can  be  transmembranous  or  secreted 
[28].  It  is  known  that  Sema3E  cleavage  promotes  inva¬ 
sive  growth  and  metastasis  in  vivo  [28].  These  genes 
also  have  selective  targeting  by  Rac  and  Rho  family 
members.  This  generates  hypotheses  of  possible  path¬ 
ways  that  could  be  targeted  therapeutically.  However, 
these  hypotheses  need  to  be  validated  by  further  experi¬ 
ments  with  different  inhibitors  for  the  same  targets  or 
with  alternative  methods,  e.g.  using  siRNA. 

Conclusions 

We  have  introduced  an  integrated  experimental  and 
computational  methodology  that  identifies  the  role  of 
specific  kinases  in  the  drug  response  of  a  given  cell  line. 
The  key  element  of  our  KIEN  methodology  is  a  multiple 
regression  procedure  that  uses  in  vitro  screen  data  as  a 
training  set.  If  a  new  library  of  kinase  inhibitor  com¬ 
pounds  were  to  be  synthetized  and  profiled,  then  our 
model  would  be  able  to  immediately  estimate  the  effect 
of  these  drugs  on  in  vitro  experiments  on  a  given  cell 
line.  We  have  shown  an  application  to  a  lung  cancer  cell 
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line,  but  our  method  can  be  extended  to  different  cell 
lines.  The  method  will  facilitate  the  design  of  new  kinase 
inhibitors  and  the  development  of  therapeutic  interven¬ 
tions  with  combinations  of  many  inhibitors  [29].  The 
procedure  could  be  extended  to  three  drug  combina¬ 
tions,  if  measurements  for  these  larger  combinations 
were  available.  Finally,  the  method  could  be  extended  to 
regression  models  that  are  specific  of  cancer  cells  with 
the  same  set  of  mutations,  or  it  could  be  directly  used 
with  patient-derived  primary  cells  to  identify  a  personal¬ 
ized  treatment. 

Materials  and  methods 

Materials 

The  primary  screening  of  a  kinase  inhibitor  (KI)  library 
comprised  of  244  KIs  was  purchased  from  EMD  Chemi¬ 
cals,  and  diluted  with  DMSO  to  2  mM  concentrations 
for  high-throughput  screening  purposes.  The  KI  library 
was  stored  at  -80°C.  Additionally,  PDI<1/Aktl/Flt3  Dual 
Pathway  Inhibitor  (CAS  #  331253-86-2)  was  ordered 
from  EMD.  Only  140  out  of  244  were  used  in  the  drug- 
target  network  reconstruction  because  the  drug  profiling 
information  was  available  only  for  these  compounds. 
One  kinase  inhibitor  known  to  affect  the  kinase  targets 
indirectly  was  excluded.  We  provide  in  Additional  file  2 
the  chemical  structure  of  kinase  inhibitors  with  highest 
selectivity  in  the  primary  and  secondary  screening. 

Cell  culture 

Cell  lines  IMR-90  (normal  lung  fibroblast)  and  A549  (lung 
adenocarcinoma)  were  cultured  in  RPMI  1640  (Hyclone) 
supplemented  with  10%  Canadian  characterized  fetal 
bovine  serum  (Hyclone),  1%  200  mM  L-glutamine  (Omega), 
and  1%  penicillin/streptomycin  (Omega).  The  media 
for  the  cells  were  renewed  every  3  days  and  kept  at 
80-90%  confluency.  Cells  were  maintained  in  a  humidi¬ 
fied  environment  at  37°C  and  5%  CO2. 

Kinase  inhibitor  experiments 

IMR-90  (1500  cells/well)  and  A549  (750  cells/well)  were 
seeded  on  384-well  microplates  (Grenier  Bio-One)  and 
incubated  for  3  hours  before  the  addition  of  kinase  in¬ 
hibitor^).  The  reason  that  IMR-90  was  seeded  at  double 
the  cell  density  of  A549  is  due  to  the  difference  in  cell 
division.  IMR-90’s  doubling  time  is  36-48  hours  whereas 
A549’s  is  22  hours.  We  wanted  to  make  sure  that  the 
cells  have  divided  at  least  once  during  the  72  hr  drug 
treatment.  Furthermore,  both  A549’s  and  IMR-90’s  final 
confluency  at  72  hrs  is  90-95%  and  within  the  range  of 
the  ATPlite  lstep  assay.  Additional  file  1:  Figures  SI  and 
S2  show  the  growth  curve  for  both  cell  lines.  IMR-90 
and  A549  cell  lines  were  tested  on  the  same  day  with 
three  replicates  and  the  experiment  was  repeated  three 
times  with  randomized  well  positions  to  reduce  biases. 


ECHO  555  Liquid  Handler  (Labcyte)  was  used  to  dis¬ 
pense  nanoliter  volumes  of  each  KI  to  384-well  plates 
with  cells  attached  (wet  dispense).  The  final  volume  in 
the  plate  is  40uL  and  cells  were  incubated  for  72  hours 
with  KI  treatment. 

ATP  measurements 

ATPlite  IStep  (Perkin  Elmer)  was  used  to  evaluate  the 
cell  number  and  cytotoxicity.  ATP  measurements  were 
done  by  dispensing  20  uL  of  the  ATPlite  IStep  solution 
to  each  well  to  a  final  volume  of  60  uL.  The  plate  was 
placed  on  a  shaker  at  1100  rpm  and  the  luminescence 
activity  was  detected  by  Analyst  GT  Plate  Reader.  The 
percent  (%)  of  control  is  the  quantity  of  ATPlite  lstep 
measurement  of  the  treated  versus  the  untreated  wells 
of  each  individual  cell  type.  The  ATP  standard  was  pre¬ 
pared  with  culture  media  to  final  volume  of  40  uL,  and 
20  uL  of  ATPlite  lstep  reagent  was  added.  Additional 
file  1:  Figure  S3  shows  the  ATP  standard  curve.  The 
plate  was  read  immediately. 

Computational  methods 

Correlations  between  selectivity/viability  and  kinase 
activity  were  calculated  using  the  python  scipy  linregress 
function,  which  also  provide  p-values.  Ranking  the  p-values 
and  directly  applying  the  Benjamini-Hochberg  procedure 
gave  us  the  FDR  values.  The  elastic  net  regression  was 
carried  out  using  the  Scikit-learn  package  [30]  which 
finds  the  coefficients  /j  that  minimize  the  function 

F  =  Ym^v~a^2  +  ap^1  +  II^Hz 

where  v  is  the  vector  of  the  observed  viabilities  and  A  is 
the  matrix  containing  the  residual  activity  of  the  kinases 
from  the  profiling,  and  M  is  the  total  number  of  drugs 
or  drug  combinations  used.  The  parameters  a  and  /j 
determine  the  relative  weights  of  the  lasso  and  ridge 
penalties  quantified  using  Ll  (||  |i  )  and  L2  (||  ||2)  norm, 
respectively.  We  used  a  =  0.15  and  p  =  0.01  in  the  results 
of  Figures  4  and  5  and  in  Table  2.  We  also  tried  other 
values  of  these  parameters,  which  did  not  give  a  signifi¬ 
cant  difference  in  the  results. 

Pathway-based  enrichment 

Reactome  pathways  were  downloaded  using  a  newer  build 
of  the  ‘biomaRt’  library  (v2.12.0)  in  Bioconductor/R 
(v2.15.0).  Gene  symbols  from  the  kinase  list  were  con¬ 
verted  to  Entrez  gene  identifier  numbers  (‘entrezgene’) 
and  mapped  against  the  gene  ids  in  each  Reactome 
pathway.  For  each  pathway,  the  set  of  significant  genes 
enriched  within  any  given  pathway  was  computed  using 
a  Fisher  exact  test.  The  procedure  computes  the  signi¬ 
ficance  (p-value)  of  observing  significant  kinases,  as 
deemed  significant  by  our  method,  within  the  selected 
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pathway.  These  pathways  are  identified  from  518  Reac- 
tome  pathways.  Given  that  our  gene  set  consists 
entirely  of  kinases  and  would  be  generalized  towards 
kinase-specific  effects,  the  set  of  all  kinases  (-300) 
were  selected  for  background  adjustment  and  more 
sensitive  enrichment  of  the  pathways.  This  procedure 
was  repeated  for  each  pathway  to  generate  p-values  and 
pathway  rankings.  False  discovery  rate  [FDR]  values  were 
later  generated  to  further  restrict  significance. 

Additional  files 


Additional  file  1:  Prediction  of  kinase  inhibitor  response  using 
activity  profiling,  in-vitro  screening,  and  elastic  net  regression. 

Additional  file  2:  Chemical  structure  of  drugs  with  the  highest 
selectivity  in  the  primary  and  secondary  screen. 
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